<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 表达量</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e8%a1%a8%e8%be%be%e9%87%8f/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>一个RNA-seq实战-超级简单-2小时搞定！</title>
		<link>http://www.bio-info-trainee.com/2218.html</link>
		<comments>http://www.bio-info-trainee.com/2218.html#comments</comments>
		<pubDate>Fri, 30 Dec 2016 08:38:33 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[RNA-seq]]></category>
		<category><![CDATA[表达量]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2218</guid>
		<description><![CDATA[请不要直接拷贝我的代码，需要自己理解，然后打出来，思考我为什么这样写代码。 软件 &#8230; <a href="http://www.bio-info-trainee.com/2218.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div><span style="color: #ff0000;"><strong>请不要直接拷贝我的代码，需要自己理解，然后打出来，思考我为什么这样写代码。</strong></span></div>
<div><span style="color: #ff0000;"><strong>软件请用最新版，尤其是samtools等被我存储在系统环境变量的，考虑到读者众多，一般的软件我都会自带版本信息的！</strong></span></div>
<div>我用两个小时，不代表你是两个小时就学会，有些朋友反映学了两个星期才 学会，这很正常，没毛病，不要异想天开两个小时就达到我的水平。</div>
<div></div>
<div>转录组如果只看表达量真的是超级简单，真是超级简单，而且人家作者本来就测是SE50，这种破数据，也就是看表达量用的！</div>
<div>首先作者分析结果是：</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/17.png"><img class="alignnone size-full wp-image-2224" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/17.png" alt="1" width="619" height="325" /></a></div>
<p><span id="more-2218"></span></p>
<div>数据在GEO地址是：<a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50177">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50177</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/25.png"><img class="alignnone size-full wp-image-2225" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/25.png" alt="2" width="622" height="388" /></a></div>
<div>我们需要下载的RNA-seq的数据：</div>
<div><a href="https://www.ncbi.nlm.nih.gov//sra/?term=SRP029245">https://www.ncbi.nlm.nih.gov//sra/?term=SRP029245</a></div>
<div><a href="https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP029245">https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP029245</a></div>
<div><a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/33.png"><img class="alignnone size-full wp-image-2219" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/33.png" alt="3" width="690" height="79" /></a></div>
<div>下载地址很容易获取啦！</div>
<div>for ((i=677;i&lt;=680;i++)) ;do wget <a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245</a>/SRR957$i/SRR957$i.sra;done</div>
<div>ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump --split-3 $id;done</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/42.png"><img class="alignnone size-full wp-image-2220" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/42.png" alt="4" width="339" height="160" /></a></div>
<div></div>
<div>因为我用fastqc看了看数据质量，发现没有什么问题，代码如下：</div>
<div>ls *fastq |xargs ~/biosoft/fastqc/FastQC/fastqc -t 10</div>
<div>所以直接用hisat2软件把测序得到的fastq文件比对到hg19参考基因组上面</div>
<div>reference=/home/jianmingzeng/reference/index/hisat/hg19/genome</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957677.fastq -S control_1.sam 2&gt;control_1.log</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957678.fastq -S control_2.sam 2&gt;control_2.log</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957679.fastq -S siSUZ12_1.sam 2&gt;siSUZ12_1.log</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957680.fastq -S siSUZ12_2.sam 2&gt;siSUZ12_2.log</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/51.png"><img class="alignnone size-full wp-image-2221" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/51.png" alt="5" width="229" height="64" /></a></div>
<div></div>
<div>而且查看log日志可以发现，比对效果杠杠的：</div>
<div>93.10% overall alignment rate<br />
92.44% overall alignment rate<br />
92.36% overall alignment rate<br />
93.22% overall alignment rate</div>
<div></div>
<div>然后把sam文件根据reads name来排序并且转换为bam文件节省空间</div>
<div>ls *sam |while read id;do (nohup samtools sort -n -@ 5 -o ${id%%.*}.Nsort.bam $id &amp;);done</div>
<div><img class="alignnone size-full wp-image-2222" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/6.png" alt="6" width="271" height="75" /></div>
<div>最后用htseq-counts工具来对每一个样本进行基因的表达量定量！</div>
<div>ls *.Nsort.bam |while read id;do (nohup samtools view $id | ~/.local/bin/htseq-count -f sam -s no -i gene_name - ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf 1&gt;${id%%.*}.geneCounts 2&gt;${id%%.*}.HTseq.log&amp;);done</div>
<div>得到的文件如下：</div>
<div></div>
<div>这4个样本的基因的counts数据就可以用一系列的R包来做差异分析了，包括limma的voom，DEseq2，edgeR等等。这些包的用法都烂大街了，我就不赘述了。</div>
<div>做完差异分析，就可以跟作者的结果做对比，看看自己做的是不是对的。</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/7.png"><img class="alignnone size-full wp-image-2223" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/7.png" alt="7" width="930" height="615" /></a></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2218.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>自学miRNA-seq分析第五讲~miRNA表达量获取</title>
		<link>http://www.bio-info-trainee.com/1712.html</link>
		<comments>http://www.bio-info-trainee.com/1712.html#comments</comments>
		<pubDate>Sat, 25 Jun 2016 09:34:46 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[HTseq]]></category>
		<category><![CDATA[miRNA-seq]]></category>
		<category><![CDATA[表达量]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1712</guid>
		<description><![CDATA[拿到比对后的sam/bam文件之后，这只能算是level2的数据，一般我们给他人 &#8230; <a href="http://www.bio-info-trainee.com/1712.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>拿到比对后的sam/bam文件之后，这只能算是level2的数据，一般我们给他人share我们的结果也是直接给表达矩阵的， miRNA分析跟mRNA分析类似，但是它的表达矩阵更好获取一点。如果是mRNA，我们一般会跟基因组来比较，而基因组就那24条参考染色体，想知道具体比对到了哪个基因，需要根据基因组注释文件来写程序提取表达量信息，现在比较流行的是htseq这个软件，我前面也写过教程如何安装和使用，这里就不啰嗦了。但是对于miRNA，因为我比对的就是那1881条前体miRNA序列，所以直接分析比对的sam/bam文件就可以知道每条参考miRNA序列的表达量了。 <span id="more-1712"></span></p>
<blockquote>
<div>## step6: counts the reads which mapping to each miRNA reference.</div>
<div></div>
<div>## we need to exclude unmapped as well as multiple-mapped  reads</div>
<div></div>
<div>## XS:i:&lt;n&gt; Alignment score for second-best alignment. Can be negative. Can be greater than 0 in --local mode</div>
<div>## NM:i:1   ## NM i Edit distance to the reference, including ambiguous bases but excluding clipping</div>
<div>#The following command exclude unmapped (-F 4) as well as multiple-mapped (grep -v “XS:”) reads</div>
<div>#samtools view -F 4 input.bam | grep -v "XS:" | wc -l</div>
<div></div>
<div>## 180466//1520320</div>
<div></div>
<div>##cat &gt;<a href="http://count.hairpin.sh/">count.hairpin.sh</a></div>
<div></div>
<div>ls *hairpin.sam  | while read id</div>
<div>do</div>
<div><strong>samtools view  -SF 4 $id |perl -alne '{$h{$F[2]}++}END{print "$_\t$h{$_}" foreach sort keys %h }'  &gt; ${id%%_*}.hairpin.counts</strong></div>
<div>done</div>
<div></div>
<div>## bash <a href="http://count.hairpin.sh/">count.hairpin.sh</a></div>
<div></div>
<div>##cat &gt;<a href="http://count.mature.sh/">count.mature.sh</a></div>
<div></div>
<div>ls *mature.sam  | while read id</div>
<div>do</div>
<div><strong>samtools view  -SF 4 $id |perl -alne '{$h{$F[2]}++}END{print "$_\t$h{$_}" foreach sort keys %h }'  &gt; ${id%%_*}.mature.counts</strong></div>
<div>done</div>
<div></div>
<div>## bash <a href="http://count.mature.sh/">count.mature.sh</a></div>
</blockquote>
<div>上面的代码，是我自己写的脚本来算表达量，非常简单，因为我没有考虑细节，直接想得到各个样本测序数据的表达量而已。如果是比对到了参考基因组，就要根据miRNA的gff注释文件用htseq等软件来计算表达量啦。</div>
<div>得到了表达量，就可以跟文献来做比较啦：</div>
<blockquote>
<div>### step7: compare the results with paper's</div>
<div>GSM1470353: control-CM, experiment1; Homo sapiens; miRNA-Seq   SRR1542714</div>
<div>GSM1470354: ET1-CM, experiment1; Homo sapiens; miRNA-Seq  SRR1542715</div>
<div>GSM1470355: control-CM, experiment2; Homo sapiens; miRNA-SeqSRR1542716</div>
<div>GSM1470356: ET1-CM, experiment2; Homo sapiens; miRNA-Seq SRR1542717</div>
<div>GSM1470357: control-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542718</div>
<div>GSM1470358: ET1-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542719</div>
<div>### 下面我用R语言来检验一下，我得到的分析结果跟文章发表的结果的区别。</div>
<div> <strong>a=read.table("bowtie_bam/SRR1542714.mature.counts")</strong></div>
<div><strong> b=read.table("paper_results/GSM1470353_iPS_010313_Unstim_known_miRNA_counts.txt")</strong></div>
<div> plot(log(tmp[,2]),log(tmp[,3]))</div>
<div> cor(tmp[,2],tmp[,3])</div>
<div><strong>##[1] 0.8413439</strong></div>
</blockquote>
<div>相关性还不错，总算没有分析错咯。</div>
<div>这个代码是我自己根据文章的理解写出的，因为我本身不擅长miRNA数据分析，所以在进行alignment的时候参数选择可能并不是那么友好，如果有高手能指正就最好了，可以直接打我电话告诉我，或者发邮箱给我，邮箱用户名是jmzeng1314，是163邮箱。</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1712.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
