<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 递归</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e9%80%92%e5%bd%92/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>perl用递归法得到允许多个错配的模式匹配</title>
		<link>http://www.bio-info-trainee.com/924.html</link>
		<comments>http://www.bio-info-trainee.com/924.html#comments</comments>
		<pubDate>Thu, 30 Jul 2015 01:19:42 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[递归]]></category>
		<category><![CDATA[错配]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=924</guid>
		<description><![CDATA[如果我要匹配一个字符串$a="ATTCCGGGAT";那么直接在shell里面g &#8230; <a href="http://www.bio-info-trainee.com/924.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>如果我要匹配一个字符串$a="ATTCCGGGAT";那么直接在shell里面grep它即可，写脚本也行$seq =~ /$a/;，但是如果我想查找这个字符串的模糊匹配，允许一个错配的情况，那么就非常多了！这时候简单的匹配已经不能达到目的，但是我们仍然可以用perl强大的正则匹配功能达到目的。</p>
<p>比如，它的匹配模式应该:</p>
<p>/.TTCCGGGAT/</p>
<p>/A.TCCGGGAT/</p>
<p>/AT.CCGGGAT/等等，可以把这些模式都综合起来就是下面这个</p>
<p>$b=(?^:(?:A(?:T(?:T(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT)|.CCGGGAT)|.TCCGGGAT)|.TTCCGGGAT))</p>
<p>所以我们就应该通过程序来生成这个字符串，然后用</p>
<p>$seq =~ /$b/;来替代$seq =~ /$a/;</p>
<p>而允许两个错配的格式就更复杂了：</p>
<p>(?^:(?:A(?:T(?:T(?:C(?:C(?:G(?:G(?:G..|.(?:A.|.T))|.(?:G(?:A.|.T)|.AT))|.(?:G(?:G(?:A.|.T)|.AT)|.GAT))|.(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT))|.(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT))|.(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT))|.(?:T(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT)|.CCGGGAT))|.(?:T(?:T(?:C(?:C(?:G(?:G(?:G(?:A.|.T)|.AT)|.GAT)|.GGAT)|.GGGAT)|.CGGGAT)|.CCGGGAT)|.TCCGGGAT)))</p>
<p>[perl]</p>
<p>$a=&quot;ATTCCGGGAT&quot;;<br />
$one_match=fuzzy_pattern($a,1);<br />
print &quot;$one_match\n&quot;;</p>
<p>sub fuzzy_pattern {<br />
my ($original_pattern, $mismatches_allowed) = @_;<br />
$mismatches_allowed &gt;= 0<br />
or die &quot;Number of mismatches must be greater than or equal to zero\n&quot;;<br />
my $new_pattern = make_approximate($original_pattern, $mismatches_allowed);<br />
return qr/$new_pattern/;<br />
}<br />
sub make_approximate {<br />
my ($pattern, $mismatches_allowed) = @_;<br />
if ($mismatches_allowed == 0) { return $pattern }<br />
elsif (length($pattern) &lt;= $mismatches_allowed)<br />
{ $pattern =~ tr/ACTG/./; return $pattern }<br />
else {<br />
my ($first, $rest) = $pattern =~ /^(.)(.*)/;<br />
my $after_match = make_approximate($rest, $mismatches_allowed);<br />
if ($first =~ /[ACGT]/) {<br />
my $after_miss = make_approximate($rest, $mismatches_allowed-1);<br />
return &quot;(?:$first$after_match|.$after_miss)&quot;;<br />
}<br />
else { return &quot;$first$after_match&quot; }<br />
}<br />
}</p>
<p>[/perl]</p>
<p>只需要控制$one_match=fuzzy_pattern($a,1);里面的参数即可控制自己想要的匹配情况。</p>
<p>然后把生成的匹配模式用了进行序列匹配$seq =~ /$one_mismatch/;</p>
<p>这个程序的重点就是解析需要生成的匹配字符串规则，然后用递归来生成这个匹配字符串。</p>
<p>这种匹配，在引物搜索特别有用。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/924.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>perl多重循环生成全排列精简</title>
		<link>http://www.bio-info-trainee.com/913.html</link>
		<comments>http://www.bio-info-trainee.com/913.html#comments</comments>
		<pubDate>Fri, 24 Jul 2015 10:09:02 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[全排列]]></category>
		<category><![CDATA[循环]]></category>
		<category><![CDATA[算法]]></category>
		<category><![CDATA[递归]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=913</guid>
		<description><![CDATA[我想输出ATCG四个字符，组成一个12个字符长度字符串的全排列。共4^12=16 &#8230; <a href="http://www.bio-info-trainee.com/913.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我想输出ATCG四个字符，组成一个12个字符长度字符串的全排列。共4^12=16777216种排列，</p>
<p>AAAAAAAAAAAA</p>
<p>AAAAAAAAAAAT</p>
<p>AAAAAAAAAAAC</p>
<p>AAAAAAAAAAAG</p>
<p>AAAAAAAAAATA</p>
<p>`````````````````````````</p>
<p>按照正常的想法是通过多重循环来生成全排列，所以我写下了下面这样的代码，但是有个问题，它不支持多扩展性，如果100个全排列，那么得写一百次循环嵌套。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/4.png"><img class="alignnone size-full wp-image-916" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/4.png" alt="4" width="914" height="451" /></a></p>
<p>所以我求助了perl-china群里的高手，其中一个给了一个结合二进制的巧妙解决方案。</p>
<p>perl -le '{$a{"00"}="A";$a{"01"}="T";$a{"10"}="C";$a{"11"}="G";for(0..100){print join"",map{$a{$_}} unpack("a2"x12,sprintf("%024b\n",$_))}}'</p>
<p>AAAAAAAAAAAA</p>
<p>AAAAAAAAAAAT</p>
<p>AAAAAAAAAAAC</p>
<p>AAAAAAAAAAAG</p>
<p>AAAAAAAAAATA</p>
<p>AAAAAAAAAATT</p>
<p>AAAAAAAAAATC</p>
<p>AAAAAAAAAATG</p>
<p>AAAAAAAAAACA</p>
<p>AAAAAAAAAACT</p>
<p>AAAAAAAAAACC</p>
<p>AAAAAAAAAACG</p>
<p>AAAAAAAAAAGA</p>
<p>AAAAAAAAAAGT</p>
<p>AAAAAAAAAAGC</p>
<p>AAAAAAAAAAGG</p>
<p>AAAAAAAAATAA</p>
<p>AAAAAAAAATAT</p>
<p>这里我简单解释一下这个代码</p>
<p>perl -le '{$a=sprintf("%024b\n",1);print $a}'</p>
<p>000000000000000000000001</p>
<p>这个sprintf是根据024b的格式把我们的十进制数字转为二级制，但是补全为24位。</p>
<p>perl -le '{@a=unpack("a2"x12,sprintf("%024b\n",100));print join":", @a}'</p>
<p>00:00:00:00:00:00:00:00:01:10:01:00</p>
<p>unpack这个函数很简单，就是把二进制的数值拆分成字符串数组，两个数字组成一个字符串。</p>
<p>最后通过map把$a{"00"}="A";$a{"01"}="T";$a{"10"}="C";$a{"11"}="G";对应成hash值并且输出即可！</p>
<p>然后这个大神又提成了一个递归的办法来解决</p>
<p>[perl]</p>
<p>#!/usr/bin/perl</p>
<p>my $arr = [];</p>
<p>$arr-&gt;[$_] = 0 for (0..11);<br />
my @b = (&quot;A&quot;,&quot;C&quot;,&quot;G&quot;,&quot;T&quot;);</p>
<p>while(1){<br />
print join &quot;&quot;, map {$b[$_]} @$arr;<br />
print &quot;\n&quot;;<br />
exit unless &amp;count($arr,11);<br />
}</p>
<p>sub count {<br />
my $arr = shift;<br />
my $x = shift;</p>
<p>return 0 if $x &lt; 0;</p>
<p>if ($arr-&gt;[$x] &lt; 3){<br />
$arr-&gt;[$x]++;<br />
return 1;<br />
}</p>
<p>else{<br />
$arr-&gt;[$x] = 0;<br />
return &amp;count($arr,$x - 1);<br />
}<br />
}</p>
<p>[/perl]</p>
<p>还有另外一个大神也提成了一个递归的算法解决，算法之道还是蛮有趣的</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/2.jpg"><img class="alignnone size-full wp-image-915" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/2.jpg" alt="2" width="401" height="466" /></a></p>
<p>最后还补充一个也是计算机技巧的解决方案，也是群里的朋友提出来的。</p>
<p>[perl]</p>
<p>#!/usr/bin/perl -w</p>
<p>my @A = qw(A T G C);<br />
my $N = 12;</p>
<p>foreach my $n (0..4**$N){<br />
foreach my $index (0..$N){<br />
print $A[$n%4];<br />
$n = $n &gt;&gt;2;<br />
}<br />
print &quot;\n&quot;;<br />
}</p>
<p>[/perl]</p>
<p>这个是位运算，如果C语言基础学的好的同学很快就能看懂的。</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/913.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
