<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 数据比对</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e6%95%b0%e6%8d%ae%e6%af%94%e5%af%b9/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>自学miRNA-seq分析第四讲~测序数据比对</title>
		<link>http://www.bio-info-trainee.com/1709.html</link>
		<comments>http://www.bio-info-trainee.com/1709.html#comments</comments>
		<pubDate>Sat, 25 Jun 2016 09:25:10 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[hairpin]]></category>
		<category><![CDATA[miRBase]]></category>
		<category><![CDATA[miRNA-seq]]></category>
		<category><![CDATA[数据比对]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1709</guid>
		<description><![CDATA[序列比对是大多数类型数据分析的核心，如果要利用好测序数据，比对细节非常重要，我这 &#8230; <a href="http://www.bio-info-trainee.com/1709.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>序列比对是大多数类型数据分析的核心，如果要利用好测序数据，比对细节非常重要，我这里只是研读一篇文章也就没有对比对细节过多考虑，只是列出自己的代码和自己的几点思考，力求重现文章作者的分析结果。对miRNA-seq数据有两条比对策略，一种是下载miRBase数据库里面的已知miRNA序列来进行比对，一种直接比对到参考基因组(比如人类的是hg19/hg38)，前面的比对非常简单，而且很容易就可以数出已经的所以miRNA序列的表达量，后面的比对有点耗时，而且算表达量的时候也不是很方便，但是它有个有点是可以来预测新的miRNA，所以大多数文章都会把这两条路给走一下。<span id="more-1709"></span></p>
<p>本文选择的是SHRiMP这个小众软件，起初我并没有在意，就用的bowtie2而已，参考基因组我这里因为服务器原因，就用了miRBase数据库下载的人类的参考序列，现在的miRNA版本来说，人类这个物种已知的成熟miRNA共有2588条序列，而前体miRNA共有1881条序列，我下载（下载时间2016年6月 ）的代码见<a href="http://www.bio-info-trainee.com/1697.html"> 自学miRNA-seq分析第二讲~学习资料的搜集</a> ，下面比对所用到的软件已经序列在我的： <a href="http://www.bio-info-trainee.com/1703.html">自学miRNA-seq分析第三讲~公共测序数据下载</a></p>
<blockquote>
<div>## step5 : alignment to miRBase v21 (hairpin.human.fa/mature.human.fa )</div>
<div>#### step5.1 using bowtie2 to do alignment</div>
<div></div>
<div>mkdir  bowtie2_index &amp;&amp;  cd bowtie2_index</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../hairpin.human.fa hairpin_human</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../mature.human.fa  mature_human</div>
<div>ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id   -S ${id%%.*}.hairpin.sam ; done</div>
<div><strong>## overall alignment rate:  10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95%  (before convert U to T )</strong></div>
<div><strong>## overall alignment rate:  51.77% / 70.38%/51.45% /61.14%/ 52.20% / 65.85% (after convert U to T )</strong></div>
<div></div>
<div>ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/mature_human  -U $id   -S ${id%%.*}.mature.sam ; done</div>
<div><strong>## overall alignment rate:  6.67% / 3.78% / 6.70% / 2.80%/ 6.55% / 3.23%    (before convert U to T )</strong></div>
<div><strong>## overall alignment rate:  34.94% / 46.16%/ 35.00%/ 38.50% / 35.46% /42.41%(after convert U to T )</strong></div>
<div></div>
<div>#### step5.2 using SHRiMP to do alignment</div>
<div>##    <a href="http://compbio.cs.toronto.edu/shrimp/README">http://compbio.cs.toronto.edu/shrimp/README</a></div>
<div>##    3.5 Mapping cDNA reads against a miRNA database</div>
<div>cd ~/biosoft/SHRiMP/SHRiMP_2_2_3</div>
<div>export SHRIMP_FOLDER=$PWD</div>
<div>cd -</div>
<div>##　　We project the database with:</div>
<div>$SHRIMP_FOLDER/utils/project-db.py --seed 00111111001111111100,00111111110011111100,00111111111100111100,00111111111111001100,00111111111111110000 \</div>
<div> --h-flag --shrimp-mode ls miRBase/hairpin.human.fa</div>
<div>##</div>
<div>$SHRIMP_FOLDER/bin/gmapper-ls -L  hairpin.human-ls SRR1542716.fastq  --qv-offset 33   \</div>
<div>-o 1 -H -E -a -1 -q -30 -g -30 --qv-offset 33 --strata -N 8  &gt;map.out 2&gt;map.log</div>
</blockquote>
<p>大家可以看到我们把测序reads比对到前体miRNA和成熟的miRNA结果是有略微区别的，因为一个前体miRNA可以形成多个成熟的miRNA，而并不是所有的成熟的miRNA形式都被记录在数据库，所以一般推荐我们比对到前体miRNA数据库，这样还可以预测新的成熟miRNA，也是非常有意义的。</p>
<p>而且有个非常重要的一点，就是大家可以看到我把U变成T前后比对率差异非常大，这其实是一个非常蠢的错误。我就不多说了。但是做到这一步，其实可以跟文章来做验证了，文章有提到比对率，比对的序列。</p>
<p>我也是在博客里面看到这个信息的：</p>
<p>Thank you so  much!. Yes I contacted the lab-guy and he just said that trimmed the first 4 bp and last 4bp. ( as you found)</p>
<p>So  I firstly<span class=""> </span><strong>trimmed the adapter sequences</strong>(TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC)</p>
<p>And then,<span class=""> </span><strong>trimmed the first 4bp and last 4bp</strong><span class=""><b> </b></span>from reads, which<span class=""><b> </b></span><strong>leads to the 22bp peak of read-length distribution(instead of 24bp)</strong></p>
<p>Anyhow, I tried to map with bowtie2 again.</p>
<p><strong>&gt; </strong><strong>bowtie2 --local -N 1 -L 16</strong></p>
<p><strong>-x ../miRNA_reference/<span style="color: #ff00ff;">hairpin_UtoT.fa</span></strong></p>
<p><strong>-U first4bptrimmed_A1-SmallRNA_S1_L001_R1_001_Illuminaadpatertrim.fastq</strong></p>
<p><strong>-S f4_trimmed.sam</strong></p>
<p>&nbsp;</p>
<p><strong>I also changed hairpin.fa file (U to T) </strong></p>
<p>Oh.. thank you David,</p>
<p>Finallly, I got</p>
<p>2565353 reads; of these:<br />
2565353 (100.00%) were unpaired; of these:<br />
479292 (18.68%) aligned 0 times<br />
11959 (0.47%) aligned exactly 1 time<br />
2074102 (80.85%) aligned &gt;1 times<br />
<strong>81.32% overall alignment rate</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1709.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
