<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; RPKM</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/rpkm/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>TPM值就是RPKM的百分比嘛！</title>
		<link>http://www.bio-info-trainee.com/2017.html</link>
		<comments>http://www.bio-info-trainee.com/2017.html#comments</comments>
		<pubDate>Mon, 14 Nov 2016 11:34:12 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[FPKM]]></category>
		<category><![CDATA[RPKM]]></category>
		<category><![CDATA[TPM]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2017</guid>
		<description><![CDATA[很久以前就有人问过这个问题啦，虽然目前主流还是用RPKM/FPKM来形容一个基因 &#8230; <a href="http://www.bio-info-trainee.com/2017.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>很久以前就有人问过这个问题啦，虽然目前主流还是用RPKM/FPKM来形容一个基因的表达量。但是既然大家都说TPM更好，我也来探究一下吧！</p>
<p>我不喜欢看公式，直接说事情，我有一个基因A，它在这个样本的转录组数据中被测序而且mapping到基因组了 5000个的reads，而这个基因A长度是10K，我们总测序文库是50M，所以这个基因A的RPKM值是 5000除以10，再除以50，为10. 就是把基因的reads数量根据基因长度和样本测序文库来normalization 。<span id="more-2017"></span></p>
<p>那么它的TPM值是多少呢？ 这个时候这些信息已经不够了，需要知道该样本其它基因的RPKM值是多少，加上该样本有3个基因，另外两个基因的RPKM值是5和35，那么我们的基因A的RPKM值为10需要换算成TPM值就是<strong><span style="text-decoration: underline;"><span style="color: #ff00ff; text-decoration: underline;"> 1,000,000 *10/(5+10+35)=200,000，</span></span></strong>看起来是不是有点大呀，其实主要是因为我们假设的基因太少了，一般个体里面都有两万多个基因的，总和会大大的增加，这样TPM值跟RPKM值差别不会这么恐怖的。</p>
<p><span style="color: #ff00ff;"><strong>TPM值就是RPKM的百分比！！！</strong></span></p>
<p><span style="color: #ff00ff;"><strong>TPM值就是RPKM的百分比！！！</strong></span></p>
<p><span style="color: #ff00ff;"><strong>TPM值就是RPKM的百分比！！！</strong></span></p>
<p>大家肯定想问，TPM的优点是什么呢？很明显，所有基因的TPM值加起来肯定是1M，因为百分比的总和就是1嘛，与样本无关，各个样本都可以保证TPM库是一样的，这样比较更有意义！！！</p>
<p>我这里没有讲FPKM，大家自己搜索学习吧，没什么意思</p>
<p>最后还是贴上公式吧！</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/12.png"><img class="alignnone size-full wp-image-2018" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/12.png" alt="1" width="613" height="587" /></a></p>
<p>&nbsp;</p>
<p>一大波我懒得看的参考资料：</p>
<div><a href="http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/">http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/</a></div>
<div><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/</a></div>
<div><a href="https://www.biostars.org/p/88751/">https://www.biostars.org/p/88751/</a></div>
<div><a href="https://www.biostars.org/p/133488/">https://www.biostars.org/p/133488/</a></div>
<div><a href="https://www.biostars.org/p/115674/">https://www.biostars.org/p/115674/</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2017.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用RNA-SeQC得到表达矩阵RPKM值</title>
		<link>http://www.bio-info-trainee.com/1349.html</link>
		<comments>http://www.bio-info-trainee.com/1349.html#comments</comments>
		<pubDate>Thu, 14 Jan 2016 12:40:14 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[RNA-SeQC]]></category>
		<category><![CDATA[RPKM]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1349</guid>
		<description><![CDATA[这个软件不仅仅能做QC，而且可以统计各个基因的RPKM值！尤其是TCGA计划里面 &#8230; <a href="http://www.bio-info-trainee.com/1349.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>
<h5>这个软件不仅仅能做QC，而且可以统计各个基因的RPKM值！尤其是TCGA计划里面的都是用它算的</h5>
<h5><b>一、程序安装</b></h5>
</div>
<div>直接在官网下载java版本软件即可<b>使用：</b><a href="http://www.broadinstitute.org/cancer/cga/tools/rnaseqc/RNA-SeQC_v1.1.8.jar">http://www.broadinstitute.org/cancer/cga/tools/rnaseqc/RNA-SeQC_v1.1.8.jar</a></div>
<div><span style="color: #ff0000;"><b>但是需要下载很多注释数据</b></span></div>
<div><span style="color: #ff0000;"><b><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard6.png"><img class="alignnone size-full wp-image-1350" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard6.png" alt="clipboard" width="752" height="331" /></a></b></span></p>
<div><span style="color: #ff0000;"><b><b>二、输入数据</b></b></span></div>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard7.png"><img class="alignnone size-full wp-image-1351" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard7.png" alt="clipboard" width="595" height="244" /></a></div>
<div>箭头所指的文件，一个都不少，只有那个rRNA.tar我没有用， 因为这个软件有两种使用方式，我用的是第一种</div>
<div><b><b>三、软件使用</b></b></div>
<div>软件的官网给力例子，很容易学习：</div>
<div>RNA-SeQC can be run with or without a BWA-based rRNA level estimation mode. To run without (less accurate, but faster) use the command:<br />
<strong>java -jar RNASeQC.jar -n 1000 -s "TestId|ThousandReads.bam|TestDesc" -t gencode.v7.annotation_goodContig.gtf -r Homo_sapiens_assembly19.fasta -o ./testReport/ -strat gc -gc gencode.v7.gc.txt </strong></div>
<div><b>我用的就是这个例子，这个例子需要的所有文件里面，染色体都是没有chr的，这个非常重要！！！</b></div>
<div>代码如下：</div>
<div><b> java -jar RNA-SeQC_v1.1.8.jar  \<br />
-n 1000 \<br />
-s "TestId|ThousandReads.bam|TestDesc" \<br />
-t gencode.v7.annotation_goodContig.gtf \<br />
-r ~/ref-database/human_g1k_v37/human_g1k_v37.fasta  \<br />
-o ./testReport/ \<br />
-strat gc \<br />
-gc gencode.v7.gc.txt \</b></div>
<div></div>
<div>To run the more accurate but slower, BWA-based method :<br />
<strong>java -jar RNASeQC.jar -n 1000 -s "TestId|ThousandReads.bam|TestDesc" -t gencode.v7.annotation_goodContig.gtf -r Homo_sapiens_assembly19.fasta -o ./testReport/ -strat gc -gc gencode.v7.gc.txt -BWArRNA human_all_rRNA.fasta</strong><br />
Note: this assumes BWA is in your PATH. If this is not the case, use the<span class="Apple-converted-space"> </span><strong>-bwa</strong><span class="Apple-converted-space"> </span>flag to specify the path to BWA</div>
<div><b><b>四、结果解读</b></b></div>
<div>运行要点时间，就那个一千条reads的测试数据都搞了10分钟！</div>
<div>出来一大堆突变，具体解释，官网上面很详细，不过，比较重要的当然是RPKM值咯，还有QC的信息</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard8.png"><img class="alignnone size-full wp-image-1352" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard8.png" alt="clipboard" width="562" height="294" /></a></div>
<div>
<div></div>
</div>
<div>
<h5>TCGA数据里面都会提供由<a href="http://www.broadinstitute.org/rna-seqc" target="_blank">RNA-SeQC</a>软件处理得到的表达矩阵！</h5>
<h5>Expression</h5>
<ul>
<li>RPKM data are used as produced by <a href="http://www.broadinstitute.org/rna-seqc" target="_blank">RNA-SeQC.</a></li>
<li>Filter on &gt;=10 individuals with &gt;0.1 RPKM and raw read counts greater than 6.</li>
<li>Quantile normalization was performed within each tissue to bring the expression profile of each sample onto the same scale.</li>
<li>To protect from outliers, inverse quantile normalization was performed for each gene, mapping each set of expression values to a standard normal.</li>
</ul>
<div>软件的主页是：</div>
<div><a href="http://www.broadinstitute.org/cancer/cga/rnaseqc_run" target="_blank">http://www.broadinstitute.org/cancer/cga/rnaseqc_run</a></div>
<div><a href="http://www.broadinstitute.org/cancer/cga/rnaseqc_download" target="_blank">http://www.broadinstitute.org/cancer/cga/rnaseqc_download</a></div>
<div>帮助文件：<a href="http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf">http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf</a></div>
</div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1349.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
