<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 碱基</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e7%a2%b1%e5%9f%ba/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>模拟测序lambda_virus基因组</title>
		<link>http://www.bio-info-trainee.com/853.html</link>
		<comments>http://www.bio-info-trainee.com/853.html#comments</comments>
		<pubDate>Thu, 16 Jul 2015 09:18:41 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[模拟测序]]></category>
		<category><![CDATA[碱基]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=853</guid>
		<description><![CDATA[lambda_virus基因组文件是bowtie软件自带的测试数据，共48502 &#8230; <a href="http://www.bio-info-trainee.com/853.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h5>lambda_virus基因组文件是bowtie软件自带的测试数据，共48502个bp，首先我用脚本模拟出它的全打断文件！</h5>
<p>perl -alne '{next if /^&gt;/;$a.=$_;}END{$len=length $a;print substr($a.$a,$_,120) foreach 0..$len}' lambda_virus.fa &gt;lamb_virus.120bp</p>
<p>长度均为120bp的片段。</p>
<p>我测序的策略是CTAG碱基重复30次，共加入120个碱基。</p>
<p>对每个120bp片段来说，如果遇到互补碱基就加上，直到120个碱基加完，这样如果比较巧合的话，会有部分碱基能全部加满120bp的，但是如果每个120bp片段的ATCG分布均匀，那么就都应该30bp碱基能被加上。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0011.png"><img class="alignnone  wp-image-854" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0011.png" alt="image001" width="1048" height="645" /></a></p>
<p>[perl]<br />
while (&lt;&gt;) {</p>
<p>$seq=$_;$sum=0;</p>
<p>foreach $i (0..120){</p>
<p>$str=substr($seq,$i,2);</p>
<p>if ($str eq &quot;GG&quot;| $str eq &quot;CC&quot;| $str eq &quot;AA&quot;| $str eq &quot;TT&quot;){$sum+=4;}</p>
<p>elsif ($str eq &quot;GT&quot;| $str eq &quot;CG&quot;| $str eq &quot;AC&quot;| $str eq &quot;TA&quot;){$sum+=3;}</p>
<p>elsif ($str eq &quot;GA&quot;|$str eq &quot;CT&quot;| $str eq &quot;AG&quot;| $str eq &quot;TC&quot;){$sum+=2;}</p>
<p>else{$sum+=1;};</p>
<p>#print &quot;$sum\n&quot;;</p>
<p>if ($sum&gt;120){print &quot;$i\n&quot;;last;}</p>
<p>}</p>
<p>}</p>
<p>[/perl]</p>
<p>perl length.pl lambda_virus.120bp &gt;length.txt</p>
<p>得到结果如下：</p>
<p>&nbsp;</p>
<table>
<tbody>
<tr>
<td width="35">Length</td>
<td width="31">36</td>
<td width="31">37</td>
<td width="31">38</td>
<td width="31">39</td>
<td width="30">40</td>
<td width="30">41</td>
<td width="30">42</td>
<td width="31">43</td>
<td width="31">44</td>
<td width="31">45</td>
<td width="31">46</td>
<td width="31">47</td>
<td width="31">48</td>
<td width="31">49</td>
<td width="31">50</td>
<td width="31">51</td>
</tr>
<tr>
<td width="35">Count</td>
<td width="31">2</td>
<td width="31">19</td>
<td width="31">34</td>
<td width="31">110</td>
<td width="30">204</td>
<td width="30">432</td>
<td width="30">878</td>
<td width="31">1495</td>
<td width="31">2237</td>
<td width="31">3202</td>
<td width="31">4343</td>
<td width="31">5179</td>
<td width="31">5697</td>
<td width="31">5429</td>
<td width="31">4865</td>
<td width="31">4214</td>
</tr>
<tr>
<td width="35">Length</td>
<td width="31">52</td>
<td width="31">53</td>
<td width="31">54</td>
<td width="31">55</td>
<td width="30">56</td>
<td width="30">57</td>
<td width="30">58</td>
<td width="31">59</td>
<td width="31">60</td>
<td width="31">61</td>
<td width="31">62</td>
<td width="31">63</td>
<td width="31">64</td>
<td width="31"></td>
<td width="31"></td>
<td width="31"></td>
</tr>
<tr>
<td width="35">Count</td>
<td width="31">3249</td>
<td width="31">2499</td>
<td width="31">1735</td>
<td width="31">1090</td>
<td width="30">657</td>
<td width="30">396</td>
<td width="30">228</td>
<td width="31">141</td>
<td width="31">90</td>
<td width="31">48</td>
<td width="31">18</td>
<td width="31">9</td>
<td width="31">3</td>
<td width="31"></td>
<td width="31"></td>
<td width="31"></td>
</tr>
</tbody>
</table>
<p>右表可以看出，大部分测序得到碱基长度都集中在46bp到51bp之间</p>
<p>画出箱线图如下</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0031.png"><img class="alignnone size-full wp-image-855" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0031.png" alt="image003" width="613" height="505" /></a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image005.png">画出条形图如下：</a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image005.png"><img class="alignnone size-full wp-image-856" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image005.png" alt="image005" width="619" height="499" /></a></p>
<p>&nbsp;</p>
<p>然后我模拟了一个6000bp的基因组，做同样的模拟测序看看评价测序长度分布情况：</p>
<table>
<tbody>
<tr>
<td width="53">Length</td>
<td width="29">39</td>
<td width="29">40</td>
<td width="29">41</td>
<td width="36">42</td>
<td width="36">43</td>
<td width="36">44</td>
<td width="36">45</td>
<td width="36">46</td>
<td width="36">47</td>
<td width="36">48</td>
<td width="36">49</td>
<td width="36">50</td>
<td width="36">51</td>
<td width="36">52</td>
<td width="36">53</td>
<td width="36">54</td>
<td width="29">55</td>
<td width="29">56</td>
<td width="29">57</td>
<td width="29">58</td>
</tr>
<tr>
<td width="53">Count</td>
<td width="29">9</td>
<td width="29">22</td>
<td width="29">96</td>
<td width="36">207</td>
<td width="36">322</td>
<td width="36">382</td>
<td width="36">479</td>
<td width="36">671</td>
<td width="36">770</td>
<td width="36">714</td>
<td width="36">706</td>
<td width="36">546</td>
<td width="36">424</td>
<td width="36">232</td>
<td width="36">182</td>
<td width="36">100</td>
<td width="29">52</td>
<td width="29">30</td>
<td width="29">14</td>
<td width="29">21</td>
</tr>
<tr>
<td width="53">Length</td>
<td width="29">59</td>
<td width="29">60</td>
<td width="29">61</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="29">&nbsp;</td>
<td width="29">&nbsp;</td>
<td width="29">&nbsp;</td>
<td width="29">&nbsp;</td>
</tr>
<tr>
<td width="53">Count</td>
<td width="29">15</td>
<td width="29">5</td>
<td width="29">2</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="36">&nbsp;</td>
<td width="29">&nbsp;</td>
<td width="29">&nbsp;</td>
<td width="29">&nbsp;</td>
<td width="29">&nbsp;</td>
</tr>
</tbody>
</table>
<p>可以看出这次的测序片段集中在45到51bp</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/853.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
