<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 错配</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e9%94%99%e9%85%8d/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>perl用递归法得到允许多个错配的模式匹配</title>
		<link>http://www.bio-info-trainee.com/924.html</link>
		<comments>http://www.bio-info-trainee.com/924.html#comments</comments>
		<pubDate>Thu, 30 Jul 2015 01:19:42 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[递归]]></category>
		<category><![CDATA[错配]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=924</guid>
		<description><![CDATA[如果我要匹配一个字符串$a="ATTCCGGGAT";那么直接在shell里面g &#8230; <a href="http://www.bio-info-trainee.com/924.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>如果我要匹配一个字符串$a="ATTCCGGGAT";那么直接在shell里面grep它即可，写脚本也行$seq =~ /$a/;，但是如果我想查找这个字符串的模糊匹配，允许一个错配的情况，那么就非常多了！这时候简单的匹配已经不能达到目的，但是我们仍然可以用perl强大的正则匹配功能达到目的。</p>
<p>比如，它的匹配模式应该:</p>
<p>/.TTCCGGGAT/</p>
<p>/A.TCCGGGAT/</p>
<p>/AT.CCGGGAT/等等，可以把这些模式都综合起来就是下面这个</p>
<p>$b=(?^:(?:A(?:T(?:T(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT)|.CCGGGAT)|.TCCGGGAT)|.TTCCGGGAT))</p>
<p>所以我们就应该通过程序来生成这个字符串，然后用</p>
<p>$seq =~ /$b/;来替代$seq =~ /$a/;</p>
<p>而允许两个错配的格式就更复杂了：</p>
<p>(?^:(?:A(?:T(?:T(?:C(?:C(?:G(?:G(?:G..|.(?:A.|.T))|.(?:G(?:A.|.T)|.AT))|.(?:G(?:G(?:A.|.T)|.AT)|.GAT))|.(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT))|.(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT))|.(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT))|.(?:T(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT)|.CCGGGAT))|.(?:T(?:T(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT)|.CCGGGAT)|.TCCGGGAT)))</p>
<p>[perl]</p>
<p>$a=&quot;ATTCCGGGAT&quot;;<br />
$one_match=fuzzy_pattern($a,1);<br />
print &quot;$one_match\n&quot;;</p>
<p>sub fuzzy_pattern {<br />
my ($original_pattern, $mismatches_allowed) = @_;<br />
$mismatches_allowed &gt;= 0<br />
or die &quot;Number of mismatches must be greater than or equal to zero\n&quot;;<br />
my $new_pattern = make_approximate($original_pattern, $mismatches_allowed);<br />
return qr/$new_pattern/;<br />
}<br />
sub make_approximate {<br />
my ($pattern, $mismatches_allowed) = @_;<br />
if ($mismatches_allowed == 0) { return $pattern }<br />
elsif (length($pattern) &lt;= $mismatches_allowed)<br />
{ $pattern =~ tr/ACTG/./; return $pattern }<br />
else {<br />
my ($first, $rest) = $pattern =~ /^(.)(.*)/;<br />
my $after_match = make_approximate($rest, $mismatches_allowed);<br />
if ($first =~ /[ACGT]/) {<br />
my $after_miss = make_approximate($rest, $mismatches_allowed-1);<br />
return &quot;(?:$first$after_match|.$after_miss)&quot;;<br />
}<br />
else { return &quot;$first$after_match&quot; }<br />
}<br />
}</p>
<p>[/perl]</p>
<p>只需要控制$one_match=fuzzy_pattern($a,1);里面的参数即可控制自己想要的匹配情况。</p>
<p>然后把生成的匹配模式用了进行序列匹配$seq =~ /$one_mismatch/;</p>
<p>这个程序的重点就是解析需要生成的匹配字符串规则，然后用递归来生成这个匹配字符串。</p>
<p>这种匹配，在引物搜索特别有用。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/924.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
