<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; RNA-SeQC</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/rna-seqc/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用broad出品的软件来处理bam文件几次遇到文件头错误</title>
		<link>http://www.bio-info-trainee.com/1354.html</link>
		<comments>http://www.bio-info-trainee.com/1354.html#comments</comments>
		<pubDate>Thu, 14 Jan 2016 12:41:28 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[未分类]]></category>
		<category><![CDATA[gatk]]></category>
		<category><![CDATA[RNA-SeQC]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1354</guid>
		<description><![CDATA[报错如下：ERROR MESSAGE: SAM/BAM file input.m &#8230; <a href="http://www.bio-info-trainee.com/1354.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>报错如下：ERROR MESSAGE: SAM/BAM file input.marked.bam is malformed: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups ！</p>
<div><span style="color: #333333; font-family: Helvetica Neue,Helvetica,Arial,sans-serif;">有些人遇到的是bam的染色体顺序不一样，还有可能是染色体的名字不一样，比如&gt;1和&gt;chr1的区别，虽然很傻，但是遇到这样问题的还不少！</span></div>
<div><span style="color: #333333; font-family: Helvetica Neue,Helvetica,Arial,sans-serif;">还有一些人是遇到基因组没有dict文件，也是用picard处理一下就好。<br />
</span></p>
<div>大部分人是在GATK遇到的，我是在RNA-SeQC遇到的，不过原理都是一样的。</div>
<div>都是因为做alignment的时候<b>并未添加头信息</b>，比如：</div>
<div>bwa samse ref.fa my.sai my.fastq &gt; my.sam</div>
<div>samtools view -bS my.sam &gt; my.bam</div>
<div>samtools sort my.bam my_sorted</div>
<div>java -jar ReordereSam.jar I=/path/my_sorted.bam O=/path/my_reordered.bam R=/path/ref.fa</div>
<div>通过这个代码可以得到<b>排序好的bam</b>，但是接下来用GATK就会报错</div>
<div>java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R /paht/ref.fa -I /path/aln_reordered.bam</div>
<div>就是因为没有头信息，group相关信息，解决方法有两种：</div>
<div>bwa samse -r<span class="Apple-converted-space"> </span><b>@RG\tID:IDa\tSM:SM\tPL:Illumina<span class="Apple-converted-space"> </span></b>ref.fa my.sai my.fastq &gt; my.sam</div>
<div>java -jar AddOrReplaceReadGroups I=my.bam O=myGr.bam<span class="Apple-converted-space"> </span><b>LB=whatever PL=illumina PU=whatever SM=whatever</b></div>
<div>一种是比对的时候就加入头信息，这个需要比对工具的支持。</div>
<div>第二种是用picard工具来修改bam，推荐用这个！虽然我其实并不懂这些头文件信息是干嘛的， 但是broad开发的软件就是需要！希望将来去读PHD能系统性的学习一些基础知识！</div>
<div></div>
<div>参考：<a href="http://seqanswers.com/forums/showthread.php?t=17233">http://seqanswers.com/forums/showthread.php?t=17233</a></div>
<div><a href="https://www.biostars.org/p/115819/">https://www.biostars.org/p/115819/</a></div>
<div><a href="http://gatkforums.broadinstitute.org/gatk/discussion/2667/bam-is-malformed-depthofcoverage">http://gatkforums.broadinstitute.org/gatk/discussion/2667/bam-is-malformed-depthofcoverage</a></div>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1354.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用RNA-SeQC得到表达矩阵RPKM值</title>
		<link>http://www.bio-info-trainee.com/1349.html</link>
		<comments>http://www.bio-info-trainee.com/1349.html#comments</comments>
		<pubDate>Thu, 14 Jan 2016 12:40:14 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[RNA-SeQC]]></category>
		<category><![CDATA[RPKM]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1349</guid>
		<description><![CDATA[这个软件不仅仅能做QC，而且可以统计各个基因的RPKM值！尤其是TCGA计划里面 &#8230; <a href="http://www.bio-info-trainee.com/1349.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>
<h5>这个软件不仅仅能做QC，而且可以统计各个基因的RPKM值！尤其是TCGA计划里面的都是用它算的</h5>
<h5><b>一、程序安装</b></h5>
</div>
<div>直接在官网下载java版本软件即可<b>使用：</b><a href="http://www.broadinstitute.org/cancer/cga/tools/rnaseqc/RNA-SeQC_v1.1.8.jar">http://www.broadinstitute.org/cancer/cga/tools/rnaseqc/RNA-SeQC_v1.1.8.jar</a></div>
<div><span style="color: #ff0000;"><b>但是需要下载很多注释数据</b></span></div>
<div><span style="color: #ff0000;"><b><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard6.png"><img class="alignnone size-full wp-image-1350" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard6.png" alt="clipboard" width="752" height="331" /></a></b></span></p>
<div><span style="color: #ff0000;"><b><b>二、输入数据</b></b></span></div>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard7.png"><img class="alignnone size-full wp-image-1351" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard7.png" alt="clipboard" width="595" height="244" /></a></div>
<div>箭头所指的文件，一个都不少，只有那个rRNA.tar我没有用， 因为这个软件有两种使用方式，我用的是第一种</div>
<div><b><b>三、软件使用</b></b></div>
<div>软件的官网给力例子，很容易学习：</div>
<div>RNA-SeQC can be run with or without a BWA-based rRNA level estimation mode. To run without (less accurate, but faster) use the command:<br />
<strong>java -jar RNASeQC.jar -n 1000 -s "TestId|ThousandReads.bam|TestDesc" -t gencode.v7.annotation_goodContig.gtf -r Homo_sapiens_assembly19.fasta -o ./testReport/ -strat gc -gc gencode.v7.gc.txt </strong></div>
<div><b>我用的就是这个例子，这个例子需要的所有文件里面，染色体都是没有chr的，这个非常重要！！！</b></div>
<div>代码如下：</div>
<div><b> java -jar RNA-SeQC_v1.1.8.jar  \<br />
-n 1000 \<br />
-s "TestId|ThousandReads.bam|TestDesc" \<br />
-t gencode.v7.annotation_goodContig.gtf \<br />
-r ~/ref-database/human_g1k_v37/human_g1k_v37.fasta  \<br />
-o ./testReport/ \<br />
-strat gc \<br />
-gc gencode.v7.gc.txt \</b></div>
<div></div>
<div>To run the more accurate but slower, BWA-based method :<br />
<strong>java -jar RNASeQC.jar -n 1000 -s "TestId|ThousandReads.bam|TestDesc" -t gencode.v7.annotation_goodContig.gtf -r Homo_sapiens_assembly19.fasta -o ./testReport/ -strat gc -gc gencode.v7.gc.txt -BWArRNA human_all_rRNA.fasta</strong><br />
Note: this assumes BWA is in your PATH. If this is not the case, use the<span class="Apple-converted-space"> </span><strong>-bwa</strong><span class="Apple-converted-space"> </span>flag to specify the path to BWA</div>
<div><b><b>四、结果解读</b></b></div>
<div>运行要点时间，就那个一千条reads的测试数据都搞了10分钟！</div>
<div>出来一大堆突变，具体解释，官网上面很详细，不过，比较重要的当然是RPKM值咯，还有QC的信息</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard8.png"><img class="alignnone size-full wp-image-1352" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard8.png" alt="clipboard" width="562" height="294" /></a></div>
<div>
<div></div>
</div>
<div>
<h5>TCGA数据里面都会提供由<a href="http://www.broadinstitute.org/rna-seqc" target="_blank">RNA-SeQC</a>软件处理得到的表达矩阵！</h5>
<h5>Expression</h5>
<ul>
<li>RPKM data are used as produced by <a href="http://www.broadinstitute.org/rna-seqc" target="_blank">RNA-SeQC.</a></li>
<li>Filter on &gt;=10 individuals with &gt;0.1 RPKM and raw read counts greater than 6.</li>
<li>Quantile normalization was performed within each tissue to bring the expression profile of each sample onto the same scale.</li>
<li>To protect from outliers, inverse quantile normalization was performed for each gene, mapping each set of expression values to a standard normal.</li>
</ul>
<div>软件的主页是：</div>
<div><a href="http://www.broadinstitute.org/cancer/cga/rnaseqc_run" target="_blank">http://www.broadinstitute.org/cancer/cga/rnaseqc_run</a></div>
<div><a href="http://www.broadinstitute.org/cancer/cga/rnaseqc_download" target="_blank">http://www.broadinstitute.org/cancer/cga/rnaseqc_download</a></div>
<div>帮助文件：<a href="http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf">http://www.broadinstitute.org/cancer/cga/sites/default/files/data/tools/rnaseqc/RNA-SeQC_Help_v1.1.2.pdf</a></div>
</div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
<div><span style="color: #333333; font-family: Arial,Verdana,Tahoma,Helvetica,Bitstream Vera Sans,sans-serif; font-size: small;"> </span></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1349.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
