<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 转录组</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e8%bd%ac%e5%bd%95%e7%bb%84/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>单细胞转录组数据分析CNV</title>
		<link>http://www.bio-info-trainee.com/3065.html</link>
		<comments>http://www.bio-info-trainee.com/3065.html#comments</comments>
		<pubDate>Sat, 17 Feb 2018 10:17:02 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[cnv]]></category>
		<category><![CDATA[单细胞]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=3065</guid>
		<description><![CDATA[单细胞转录组数据分析CNV 都来aviv Regev自于实验室，一系列文章都利用 &#8230; <a href="http://www.bio-info-trainee.com/3065.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h1 class="md-end-block md-heading md-focus" contenteditable="true"><span class="">单细胞转录组数据分析CNV</span></h1>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">都来aviv Regev自于实验室，一系列文章都利用了单细胞转录组数据分析CNV。</span></span><span id="more-3065"></span></p>
<h3 class="md-end-block md-heading" contenteditable="true"><span class=""> 2014年关于GBM的science文章</span></h3>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">首先是2014年关于GBM的science文章；PMID: </span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/pubmed/24925914">24925914</a></span> ，提到了这个分析点，然后还用了CCLE数据库验证可靠性。</span></p>
<p><span class="md-line md-end-block" contenteditable="true">该文章自己的单细胞转录组数据建库选用了 SMART-seq 方法，公布在 <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57872">GSE57872</a></span></span></p>
<ul class="ul-list" data-mark="-">
<li><span class="md-line md-end-block" contenteditable="true">430(576) single glioblastoma cells isolated from 5 individual tumors</span></li>
<li><span class="md-line md-end-block" contenteditable="true">102(192) single cells from gliomasphere cells lines </span></li>
</ul>
<p><span class="md-line md-end-block" contenteditable="true">这个单细胞转录组建库方式有点落后了：</span></p>
<blockquote><p><span class="md-line md-end-block" contenteditable="true">SMART-seq protocol was implemented to generate single cell full length transcriptomes (modified from Shalek, et al Nature 2013) and sequenced using 25 bp paired end reads. Single cell cDNA libraries for <span class=""><strong>MGH30 were resequenced using 100 bp paired end reads</strong></span> to allow for isoform and splice junction reconstruction (96 samples, annotated MGH30L). </span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">所以作者过滤的比较严格，可以直接下载其分析好的表达矩阵，也可以下载原始测序数据自己走一波转录组流程。</span></span></p>
<p><span class="md-line md-end-block" contenteditable="true">第一次提出的公式如下：</span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/RNA-SEQ-CNV-formula-1.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/RNA-SEQ-CNV-formula-1.png" alt="" /></span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true"><span class=""> 2016年关于melanoma的science文章</span></h3>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">然后是2016年关于melanoma的science文章：PMID: </span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/pubmed/27124452">27124452</a></span> 也应用了单细胞转录组数据分析CNV，该文章的数据公布在 <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72056">GSE72056</a></span> 这次使用的Smart-seq2建库技术，共计 4645 个细胞，仅仅是表达矩阵就由71Mb，但是原始的测试数据在 dbGaP 数据库，需要申请才能下载。</span></p>
<figure class="md-table-fig" contenteditable="false">
<table class="md-table">
<thead>
<tr class="md-end-block">
<th><span class="td-span" contenteditable="true"><span class=""><strong>Supplementary file</strong></span></span></th>
<th><span class="td-span" contenteditable="true"><span class=""><strong>Size</strong></span></span></th>
<th><span class="td-span" contenteditable="true"><span class=""><strong>Download</strong></span></span></th>
<th><span class="td-span" contenteditable="true"><span class=""><strong>File type/resource</strong></span></span></th>
</tr>
</thead>
<tbody>
<tr class="md-end-block">
<td><span class="td-span" contenteditable="true">GSE72056_melanoma_single_cell_revised_v2.txt.gz</span></td>
<td><span class="td-span" contenteditable="true">71.6 Mb</span></td>
<td><span class="td-span" contenteditable="true"><span class=""><a spellcheck="false" href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE72nnn/GSE72056/suppl/GSE72056_melanoma_single_cell_revised_v2.txt.gz">(ftp)</a></span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE72056&amp;format=file&amp;file=GSE72056%5Fmelanoma%5Fsingle%5Fcell%5Frevised%5Fv2%2Etxt%2Egz">(http)</a></span></span></td>
<td><span class="td-span" contenteditable="true">TXT</span></td>
</tr>
</tbody>
</table>
</figure>
<blockquote><p><span class="md-line md-end-block" contenteditable="true"><span class="">we applied single-cell RNA sequencing (RNA-seq) to 4645 single cells isolated from 19 patients, profiling malignant, immune, stromal, and endothelial cells.</span></span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true">值得注意的是作者还做了bulk的转录组测序，针对6个处理 RAF or RAF+MEK inhibitors 前后供12个数据，公布在 <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE77940">GSE77940</a></span></span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">这个时候的计算公式稍微有点变化了，如下：</span></span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula2.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula2.png" alt="" /></span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true"><span class=""> 2016年CELL杂志发表的关于头颈癌</span></h3>
<p><span class="md-line md-end-block" contenteditable="true">接着是2016年CELL杂志发表的关于头颈癌的文章：<span class=""><a spellcheck="false" href="https://www.sciencedirect.com/science/article/pii/S0092867417312709?via%3Dihub">Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer</a></span> 测序如下；</span></p>
<blockquote><p><span class="md-line md-end-block" contenteditable="true">We profiled <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/neuroscience/transcriptome">transcriptomes</a></span> of <span class=""><strong>∼6,000 single cells from 18 head and neck <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/neuroscience/squamous-epithelial-cell">squamous cell</a></span> carcinoma</strong></span> (HNSCC) patients, including five matched pairs of primary tumors and <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/neuroscience/lymph-node">lymph node metastases</a></span>.</span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true">同时也对这些病人测了whole-exome sequencing (WES) and targeted <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/genotyping">genotyping</a></span> (SNaPshot) data，但是这些数据公布在 <span spellcheck="false"><code>phs001474.v1.p1</code></span> ，不是很方便下载。</span></p>
<p><span class="md-line md-end-block" contenteditable="true">单细胞转录组建库用的<span spellcheck="false"><code>Smart-seq2</code></span>方法，所有的数据公布在 <span class=""><a spellcheck="false" href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103322">GSE103322</a></span> ， 仅仅是表达矩阵都有近100Mb了。</span></p>
<div class="CodeMirror cm-s-inner CodeMirror-wrap">
<div></div>
<div class="CodeMirror-scroll" tabindex="-1">
<div class="CodeMirror-sizer">
<div>
<div class="CodeMirror-lines">
<div>
<div class="CodeMirror-measure"></div>
<div class="CodeMirror-measure"></div>
<div></div>
<div class="CodeMirror-cursors"></div>
<div class="CodeMirror-code">
<div class="CodeMirror-activeline">
<div class="CodeMirror-activeline-background CodeMirror-linebackground"></div>
<div class="CodeMirror-gutter-background CodeMirror-activeline-gutter"></div>
<pre class=" CodeMirror-line ">GSE103322_HNSCC_all_data.txt.gz | 86.0 Mb |</pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div></div>
</div>
</div>
<p><span class="md-line md-end-block" contenteditable="true">下载地址是： <span class=""><a spellcheck="false" href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE103nnn/GSE103322/suppl/GSE103322%5FHNSCC%5Fall%5Fdata%2Etxt%2Egz">(ftp)</a></span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE103322&amp;format=file&amp;file=GSE103322%5FHNSCC%5Fall%5Fdata%2Etxt%2Egz">(http)</a></span> </span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula-3.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula-3.png" alt="" /></span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true">用CCLE数据做验证</h3>
<p><span class="md-line md-end-block" contenteditable="true">2014年关于GBM的science文章；PMID: <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/pubmed/24925914">24925914</a></span><span class=""> ，文章提到：</span></span></p>
<blockquote><p><span class="md-line md-end-block" contenteditable="true">We downloaded the CCLE gene-centric RMA-normalized Affymetrix data (<span spellcheck="false"><a href="http://www.broadinstitute.org/ccle/">http://www.broadinstitute.org/ccle/</a></span>), and centered the expression of each gene across all cell lines at zero.</span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true">需要简单注册后才能下载：<span spellcheck="false"><a href="https://portals.broadinstitute.org/ccle/users/sign_in">https://portals.broadinstitute.org/ccle/users/sign_in</a></span> </span></p>
<p><span class="md-line md-end-block" contenteditable="true">理论上要得到下面的图：</span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png" alt="" /></span>](<span spellcheck="false"><a href="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png">http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png</a></span><span class="">)</span></span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-expand">说明使用转录组数据分析到的CNV情况和SNP6.0芯片的结果差异不大。</span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true">还有GTEx数据库的验证</h3>
<p><span class="md-line md-end-block" contenteditable="true">To compare these patterns to an external reference of normal cells we downloaded RNA-Seq data from the GTEX portal (<span spellcheck="false"><a href="http://www.gtexportal.org/">http://www.gtexportal.org/</a></span><span class="">; gene read counts file from Jan. 2013), and estimated CNV values as above: we normalized the read counts into log2(TPM+1), averaged all brain samples, restricted the data to the ~6,000 analyzed genes, subtracted for each gene the average normalized expression from the GBM single-cell data (this step is comparable to the centering of the single cell data) and then used a moving average of 100 genes over the genomically-ordered list of genes to define CNV-cont.</span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true">总结</h3>
<p><span class="md-line md-end-block" contenteditable="true">上述文章及数据都是有表达矩阵可以下载，所以仅仅是根据这些文章的补充材料公布的公式即可重复整个流程啦。</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/3065.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>一个RNA-seq实战-超级简单-2小时搞定！</title>
		<link>http://www.bio-info-trainee.com/2218.html</link>
		<comments>http://www.bio-info-trainee.com/2218.html#comments</comments>
		<pubDate>Fri, 30 Dec 2016 08:38:33 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[RNA-seq]]></category>
		<category><![CDATA[表达量]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2218</guid>
		<description><![CDATA[请不要直接拷贝我的代码，需要自己理解，然后打出来，思考我为什么这样写代码。 软件 &#8230; <a href="http://www.bio-info-trainee.com/2218.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div><span style="color: #ff0000;"><strong>请不要直接拷贝我的代码，需要自己理解，然后打出来，思考我为什么这样写代码。</strong></span></div>
<div><span style="color: #ff0000;"><strong>软件请用最新版，尤其是samtools等被我存储在系统环境变量的，考虑到读者众多，一般的软件我都会自带版本信息的！</strong></span></div>
<div>我用两个小时，不代表你是两个小时就学会，有些朋友反映学了两个星期才 学会，这很正常，没毛病，不要异想天开两个小时就达到我的水平。</div>
<div></div>
<div>转录组如果只看表达量真的是超级简单，真是超级简单，而且人家作者本来就测是SE50，这种破数据，也就是看表达量用的！</div>
<div>首先作者分析结果是：</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/17.png"><img class="alignnone size-full wp-image-2224" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/17.png" alt="1" width="619" height="325" /></a></div>
<p><span id="more-2218"></span></p>
<div>数据在GEO地址是：<a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50177">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50177</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/25.png"><img class="alignnone size-full wp-image-2225" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/25.png" alt="2" width="622" height="388" /></a></div>
<div>我们需要下载的RNA-seq的数据：</div>
<div><a href="https://www.ncbi.nlm.nih.gov//sra/?term=SRP029245">https://www.ncbi.nlm.nih.gov//sra/?term=SRP029245</a></div>
<div><a href="https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP029245">https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP029245</a></div>
<div><a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/33.png"><img class="alignnone size-full wp-image-2219" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/33.png" alt="3" width="690" height="79" /></a></div>
<div>下载地址很容易获取啦！</div>
<div>for ((i=677;i&lt;=680;i++)) ;do wget <a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP029/SRP029245</a>/SRR957$i/SRR957$i.sra;done</div>
<div>ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump --split-3 $id;done</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/42.png"><img class="alignnone size-full wp-image-2220" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/42.png" alt="4" width="339" height="160" /></a></div>
<div></div>
<div>因为我用fastqc看了看数据质量，发现没有什么问题，代码如下：</div>
<div>ls *fastq |xargs ~/biosoft/fastqc/FastQC/fastqc -t 10</div>
<div>所以直接用hisat2软件把测序得到的fastq文件比对到hg19参考基因组上面</div>
<div>reference=/home/jianmingzeng/reference/index/hisat/hg19/genome</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957677.fastq -S control_1.sam 2&gt;control_1.log</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957678.fastq -S control_2.sam 2&gt;control_2.log</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957679.fastq -S siSUZ12_1.sam 2&gt;siSUZ12_1.log</div>
<div>~/biosoft/HISAT/current/hisat2 -p 5 -x $reference -U SRR957680.fastq -S siSUZ12_2.sam 2&gt;siSUZ12_2.log</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/51.png"><img class="alignnone size-full wp-image-2221" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/51.png" alt="5" width="229" height="64" /></a></div>
<div></div>
<div>而且查看log日志可以发现，比对效果杠杠的：</div>
<div>93.10% overall alignment rate<br />
92.44% overall alignment rate<br />
92.36% overall alignment rate<br />
93.22% overall alignment rate</div>
<div></div>
<div>然后把sam文件根据reads name来排序并且转换为bam文件节省空间</div>
<div>ls *sam |while read id;do (nohup samtools sort -n -@ 5 -o ${id%%.*}.Nsort.bam $id &amp;);done</div>
<div><img class="alignnone size-full wp-image-2222" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/6.png" alt="6" width="271" height="75" /></div>
<div>最后用htseq-counts工具来对每一个样本进行基因的表达量定量！</div>
<div>ls *.Nsort.bam |while read id;do (nohup samtools view $id | ~/.local/bin/htseq-count -f sam -s no -i gene_name - ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf 1&gt;${id%%.*}.geneCounts 2&gt;${id%%.*}.HTseq.log&amp;);done</div>
<div>得到的文件如下：</div>
<div></div>
<div>这4个样本的基因的counts数据就可以用一系列的R包来做差异分析了，包括limma的voom，DEseq2，edgeR等等。这些包的用法都烂大街了，我就不赘述了。</div>
<div>做完差异分析，就可以跟作者的结果做对比，看看自己做的是不是对的。</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/7.png"><img class="alignnone size-full wp-image-2223" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/7.png" alt="7" width="930" height="615" /></a></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2218.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>hisat2+stringtie+ballgown</title>
		<link>http://www.bio-info-trainee.com/2073.html</link>
		<comments>http://www.bio-info-trainee.com/2073.html#comments</comments>
		<pubDate>Fri, 25 Nov 2016 15:06:23 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[ballgown]]></category>
		<category><![CDATA[hisat2]]></category>
		<category><![CDATA[StringTie]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2073</guid>
		<description><![CDATA[早在去年九月，我就写个博文说 RNA-seq流程需要进化啦！ http://ww &#8230; <a href="http://www.bio-info-trainee.com/2073.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>早在去年九月，我就写个博文说 RNA-seq流程需要进化啦！<a href="http://www.bio-info-trainee.com/1022.html" target="_blank"> http://www.bio-info-trainee.com/1022.html </a> ，主要就是进化成hisat2+stringtie+ballgown的流程，但是我一直没有系统性的讲这个流程，因为我觉真心木有用。我只用了里面的hisat来做比对而已！但是群里的小伙伴问得特别多，我还是勉为其难的写一个教程吧，你们之间拷贝我的代码就可以安装这些软件的！然后自己找一个测试数据，我的脚本很容易用的！<span id="more-2073"></span></p>
<div>其实我最喜欢这样的文章了：<a href="http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html">http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html</a> 而且人家还提供了所有的代码，不知道大家怎么还会有疑问的<a href="http://www.nature.com/nprot/journal/v11/n9/extref/nprot.2016.095-S1.zip" target="_blank">：http://www.nature.com/nprot/journal/v11/n9/extref/nprot.2016.095-S1.zip</a></div>
<div>人家已经把流程说得清清楚楚了，我还是说一个自己的体悟吧：</div>
<div>软件安装如下：</div>
<blockquote>
<div>## Download and install HISAT</div>
<div># https://ccb.jhu.edu/software/hisat2/index.shtml</div>
<div>cd ~/biosoft</div>
<div>mkdir HISAT &amp;&amp; cd HISAT</div>
<div>#### readme: https://ccb.jhu.edu/software/hisat2/manual.shtml</div>
<div>wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.4-Linux_x86_64.zip</div>
<div>unzip hisat2-2.0.4-Linux_x86_64.zip</div>
<div>ln -s hisat2-2.0.4 current</div>
<div>## ~/biosoft/HISAT/current/hisat2-build</div>
<div>## ~/biosoft/HISAT/current/hisat2</div>
<div></div>
<div>## Download and install StringTie</div>
<div>## https://ccb.jhu.edu/software/stringtie/ ## https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual</div>
<div>cd ~/biosoft</div>
<div>mkdir StringTie &amp;&amp; cd StringTie</div>
<div>wget http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.2.3.Linux_x86_64.tar.gz</div>
<div>tar zxvf stringtie-1.2.3.Linux_x86_64.tar.gz</div>
<div>ln -s stringtie-1.2.3.Linux_x86_64 current</div>
<div># ~/biosoft/StringTie/current/stringtie</div>
</blockquote>
<div></div>
<div>软件使用，我比较喜欢用shell脚本，而且是简单的那种：</div>
<div>
<blockquote>
<div>while read id</div>
<div>do</div>
<div>sample=$(echo $id |cut -d" " -f 1 )</div>
<div>file1=$(echo $id |cut -d" " -f 2 )</div>
<div>file2=$(echo $id |cut -d" " -f 3 )</div>
<div>echo  $sample</div>
<div>echo $file1</div>
<div>echo $file2</div>
<div>~/biosoft/HISAT/current/hisat2  -p 4 --dta  -x  ~/reference/index/hisat/hg19/genome  -1 $file1 -2 $file2 -S $sample.hisat2.hg19.sam 2&gt;$sample.hisat2.hg19.log &amp;</div>
<div>done &lt;$1</div>
</blockquote>
<div>上面这个脚本需要一个3列的输入文件，分别是样本名，read1文件，read2文件，会产生以下的输出文件，sam文件。</div>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\5262fabc557a4523a4694cb992a1a399\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="0A2D6DB986A14AC0A37C06273FEC3647" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/16.png"><img class="alignnone size-full wp-image-2074" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/16.png" alt="1" width="298" height="63" /></a></div>
<blockquote>
<div>while read id</div>
<div>do</div>
<div>file=$(basename $id )</div>
<div>sample=${file%%.*}</div>
<div>echo $id $sample</div>
<div>nohup samtools sort -@ 4 -o ${sample}.sorted.bam $id &amp;</div>
<div>done &lt;$1</div>
</blockquote>
<div><span style="color: #ff0000;">最新版的samtools已经可以直接把sam文件变成排序好的bam文件啦~~~~</span></div>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\adf062aca85f49d08d1d860f3a09443e\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="9E5C149652164763BB6DE37FE9DDCA67" /><img class="alignnone size-full wp-image-2075" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/23.png" alt="2" width="266" height="65" /></div>
<blockquote>
<div>while read id</div>
<div>do</div>
<div>file=$(basename $id )</div>
<div>sample=${file%%.*}</div>
<div>echo $id $sample</div>
<div>nohup ~/biosoft/StringTie/current/stringtie  -p 4  -G ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf  -o $sample.hg19.stringtie.gtf -l $sample  $id  &amp;</div>
<div>done &lt;$1</div>
</blockquote>
<div>stringTie的用法就是这样咯。没什么好讲的</div>
<div><img class="alignnone size-full wp-image-2076" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/31.png" alt="3" width="318" height="82" /></div>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\c61ae9e9ad8a47c1a5f7886632cfa1fa\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="930C433F5E104A8FA07D0306E15026DD" /></div>
<div></div>
<div> ~/biosoft/StringTie/current/stringtie   --merge -p 8 -G ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf  -o stringtie_merged.gtf  mergelist.txt</div>
<div></div>
<div></div>
<div>while read id</div>
<div>do</div>
<div>file=$(basename $id )</div>
<div>sample=${file%%.*}</div>
<div>echo $id $sample</div>
<div>nohup ~/biosoft/StringTie/current/stringtie -e -B  -G  $2  -o ballgown/$sample/$sample.hg19.stringtie.gtf   $id  &amp;</div>
<div>done &lt;$1</div>
</div>
<div>我实在讲不下去了，因为真心不用这个东东，<strong><span style="color: #ff0000;">我都是拿到了sam/bam文件就直接去counts表达量矩阵了</span></strong>，而count reads数量是非常容易的事情，代码如下</div>
<div>nohup samtools view   A.sorted.bam.Nsort.bam |  ~/.local/bin/htseq-count -f sam  -s no -i gene_name  -   ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf    1&gt;A.geneCounts 2&gt;A.HTseq.log &amp;</div>
<div>下面的这些文件，导入到R里面用ballgown处理吧，不要在问我这个问题了。</div>
<div><img class="alignnone size-full wp-image-2077" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/4.png" alt="4" width="608" height="548" /></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2073.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>htseq-counts跟bedtools的区别</title>
		<link>http://www.bio-info-trainee.com/2022.html</link>
		<comments>http://www.bio-info-trainee.com/2022.html#comments</comments>
		<pubDate>Tue, 15 Nov 2016 03:55:21 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[bedtools]]></category>
		<category><![CDATA[htseq-counts]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2022</guid>
		<description><![CDATA[我以前写过bedtools和htseq-counts的教程，它们都可以用来对比对 &#8230; <a href="http://www.bio-info-trainee.com/2022.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我以前写过bedtools和htseq-counts的教程，它们都可以用来对比对好的bam文件进行计数，正好群里有小伙伴问我它们的区别，我就简单做了一个比较，大家可以先看看我以前写的软件教程。写的有的挫：</p>
<p><a title="详细阅读 使用Bedtools对RNA-seq进行基因计数" href="http://www.bio-info-trainee.com/745.html" rel="bookmark">使用Bedtools对RNA-seq进行基因计数</a> ，</p>
<p><a title="详细阅读 转录组HTseq对基因表达量进行计数" href="http://www.bio-info-trainee.com/244.html" rel="bookmark">转录组HTseq对基因表达量进行计数</a></p>
<p>言归正传，我这里没精力去探究它们的具体原理，只是看看它们数一个read是否属于某个基因的时候，区别在哪里，大家看下图：<span id="more-2022"></span></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/bedtoos-vs-htseq.png"><img class="alignnone size-full wp-image-2023" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/bedtoos-vs-htseq.png" alt="bedtoos-vs-htseq" width="707" height="485" /></a></p>
<div>很明显，bedtools不管三七二十一，只要你的reads比对到基因组的坐标跟目的基因坐标有交叉，就算你一个reads，不需要管你是不是multiple mapping的。</div>
<div>但是htseq就谨慎很多，而且还可以挑选model，一般来说，它会把multiple mapping的reads归类到 not unique aligned里面。</div>
<div>而且，大家做完分析，一定要再三检查，很明显人家hisat告诉你的mapping rate高达90%以上，即使除去那15%左右的multiple mapping，你counts表达量的时候，至少也可以counts 百分之五六十吧！！！</div>
<div></div>
<div>如果出现大数量级的no_feature，你自己就应该明白有问题了！</div>
<div></div>
<div>最后htseq-counts使用的时候有一些参数尤其需要注意：</div>
<div>软件官网说明书： <a href="http://www-huber.embl.de/HTSeq/doc/count.html">http://www-huber.embl.de/HTSeq/doc/count.html</a></div>
<div>参考gtf文件可以是gencode或者是ensembl数据库的，但是尤其要注释chr的问题，而且版本问题，gtf/gff格式无所谓。比对后的文件一定要进行sort，推荐一定要sort -n，根据reads的name来sort</div>
<div>-f sam/bam 这个一定要搞清楚，如果对bam文件进行counts，必须保证你服务器的python安装了正确的pysam模块</div>
<div>-r name/pos， 一般情况下我们的bam都是按照参考基因组的pos来sort的，但是这个软件默认却是reads的name，很坑，一般建议重新把bam文件sort一下，而不是选择 -r pos，因为-r pos实在是太消耗内存了。</div>
<div>-s yes/no/reverse, 这也是巨坑的参数，默认是yes，一般人拿到的数据都是no，所以千万要注意！！！</div>
<div>-t 选择gff/gtf文件的第3列，一般是exon，也可以是gene，transcript ，这个很少调整的。</div>
<div>-i 这个需要修改，不然默认是ensembl的基因ID，一般人看不懂，可以改为gene_name，前提是你的gff文件里面有gene_name这个属性。</div>
<div>其余的就不需要修改了。</div>
<div>我的代码如下：</div>
<blockquote>
<div>nohup samtools view control.Nsort.bam | ~/.local/bin/htseq-count -f sam -s no -i gene_name - ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf 1&gt;control.geneCounts 2&gt;control.HTseq.log &amp;</div>
<div>nohup samtools view G34V.Nsort.bam | ~/.local/bin/htseq-count -f sam -s no -i gene_name - ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf 1&gt;G34V.geneCounts 2&gt;G34V.HTseq.log &amp;</div>
<div>nohup samtools view K27M.Nsort.bam | ~/.local/bin/htseq-count -f sam -s no -i gene_name - ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf 1&gt;K27M.geneCounts 2&gt;K27M.HTseq.log &amp;</div>
<div>nohup samtools view WT.Nsort.bam | ~/.local/bin/htseq-count -f sam -s no -i gene_name - ~/reference/gtf/gencode/gencode.v25lift37.annotation.gtf 1&gt;WT.geneCounts 2&gt;WT.HTseq.log &amp;</div>
<div></div>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2022.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用samtools idxstats来对de novo的转录组数据计算表达量</title>
		<link>http://www.bio-info-trainee.com/1974.html</link>
		<comments>http://www.bio-info-trainee.com/1974.html#comments</comments>
		<pubDate>Mon, 31 Oct 2016 09:16:48 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[de novo]]></category>
		<category><![CDATA[idxstats]]></category>
		<category><![CDATA[samtools]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1974</guid>
		<description><![CDATA[de novo的转录组数据，比对的时候一般用的是自己组装好的trinity.fa &#8230; <a href="http://www.bio-info-trainee.com/1974.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>de novo的转录组数据，比对的时候一般用的是自己组装好的trinity.fasta序列(挑选最长蛋白的转录本序列)来做参考，用bowtie2等工具直接将原始序列比对即可。所以比对 sam/bam文件本身就包含了参考序列的每一条转录本序列ID，直接对 sam/bam文件进行counts就知道每一个基因的表达量啦！</p>
<p>本来我是准备自己写脚本对sam文件进行counts就好，但是发现了samtools自带这样的工具：<a href="http://www.htslib.org/doc/samtools.html " target="_blank">http://www.htslib.org/doc/samtools.html </a></p>
<p>如果是针对基因组序列，那么这个功能用处不大，但是针对转录本序列，统计出来的就是我们想要的转录本表达量。<span id="more-1974"></span></p>
<blockquote><p><span style="color: #ff0000;"><strong>samtools idxstats</strong></span> tmp.bowtie2.sorted.bam |head<br />
TR3|c0_g1_i1 1276 418 0<br />
TR6|c0_g1_i1 1271 10 0<br />
TR6|c0_g1_i2 944 5 0<br />
TR6|c0_g1_i3 1281 4 0<br />
TR6|c0_g1_i4 1224 53 0<br />
TR6|c0_g1_i5 855 16 0<br />
TR19|c0_g1_i2 1428 19 0<br />
TR19|c0_g1_i3 2536 624 0<br />
TR19|c0_g1_i4 3072 105 0<br />
TR19|c0_g1_i5 1685 0 0</p></blockquote>
<p>软件官网说明书，说的很清楚：</p>
<p>samtools idxstats <em>in.sam</em>|<em>in.bam</em>|<em>in.cram</em></p>
<p>Retrieve and print stats in the index file corresponding to the input file. Before calling idxstats, the input BAM file must be indexed by samtools index.</p>
<p>The output is TAB-delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads. It is written to stdout.</p>
<p>第三列，就是我们想要的表达量数据啦，比对到每个转录本序列的reads数量。</p>
<p><span style="color: #ff0000;">大家从我的转录本序列ID上面如果可以看出些什么问题，欢迎跟我交流，直接给我email就好了，jmzeng1314@163.com </span></p>
<p>现在知道了每个转录本的表达量，把每个样本都做一下，就知道表达矩阵了，做差异分析就很简单了。但是得到的是差异转录本列表，不明白这些ID背后的意义，需要取注释，才能做下一步分析。</p>
<blockquote><p>ls *sorted.bam |while read id<br />
do<br />
echo $id ${id%%.*}.t.counts<br />
nohup samtools idxstats $id 1&gt;${id%%.*}.t.counts 2&gt;/dev/null  &amp;<br />
done</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1974.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>最全面的转录组研究软件收集</title>
		<link>http://www.bio-info-trainee.com/1055.html</link>
		<comments>http://www.bio-info-trainee.com/1055.html#comments</comments>
		<pubDate>Fri, 16 Oct 2015 11:40:49 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[收集]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1055</guid>
		<description><![CDATA[能看到这个网站真的是一个意外，现在看来，还是外国人比较认真呀， 这份软件清单，能 &#8230; <a href="http://www.bio-info-trainee.com/1055.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>能看到这个网站真的是一个意外，现在看来，还是外国人比较认真呀， 这份软件清单，能看出作者的确是花了大力气的，满满的都是诚意。from: <a href="https://en.wiki2.org/wiki/List_of_RNA-Seq_bioinformatics_tools" target="_blank">https://en.wiki2.org/wiki/List_of_RNA-Seq_bioinformatics_tool</a>s</div>
<div><a href="https://en.wiki2.org/wiki/List_of_RNA-Seq_bioinformatics_tools">https://en.wiki2.org/wiki/List_of_RNA-Seq_bioinformatics_tools</a>软件主要涵盖了转录组分析的以下18个方向，看我我才明白自己的水平的确没到家，印象中的转录组分析也就是差异表达，然后注释以下，最多分析一下融合基因，要不然就看看那些miRNA，和lncRNA咯，没想到里面的学问也大着呢，怪不得生物是一个大坑，来再多的学者也不怕，咱有的是研究方向给你。</p>
</div>
<div>    1 Quality control and pre-processing data</div>
<div>        1.1 Quality control and filtering data</div>
<div>        1.2 Detection of chimeric reads</div>
<div>        1.3 Errors Correction</div>
<div>        1.4 Pre-processing data</div>
<div>    2 Alignment Tools</div>
<div>        2.1 Short (Unspliced) aligners</div>
<div>        2.2 Spliced aligners</div>
<div>            2.2.1 Aligners based on known splice junctions (annotation-guided aligners)</div>
<div>            2.2.2 De novo Splice Aligners</div>
<div>                2.2.2.1 De novo Splice Aligners that also use annotation optionally</div>
<div>                2.2.2.2 Other Spliced Aligners</div>
<div>    3 Normalization, Quantitative analysis and Differential Expression</div>
<div>        3.1 Multi-tool solutions</div>
<div>    4 Workbench (analysis pipeline / integrated solutions)</div>
<div>        4.1 Commercial Solutions</div>
<div>        4.2 Open (free) Source Solutions</div>
<div>    5 Alternative Splicing Analysis</div>
<div>        5.1 General Tools</div>
<div>        5.2 Intron Retention Analysis</div>
<div>    6 Bias Correction</div>
<div>    7 Fusion genes/chimeras/translocation finders/structural variations</div>
<div>    8 Copy Number Variation identification</div>
<div>    9 RNA-Seq simulators</div>
<div>    10 Transcriptome assemblers</div>
<div>        10.1 Genome-Guided assemblers</div>
<div>        10.2 Genome-Independent (de novo) assemblers</div>
<div>            10.2.1 Assembly evaluation tools</div>
<div>    11 Co-expression networks</div>
<div>    12 miRNA prediction</div>
<div>    13 Visualization tools</div>
<div>    14 Functional, Network &amp; Pathway Analysis Tools</div>
<div>    15 Further annotation tools for RNA-Seq data</div>
<div>    16 RNA-Seq Databases</div>
<div>    17 Webinars and Presentations</div>
<div>    18 References</div>
<div></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1055.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RNA-seq完整学习手册！</title>
		<link>http://www.bio-info-trainee.com/703.html</link>
		<comments>http://www.bio-info-trainee.com/703.html#comments</comments>
		<pubDate>Tue, 05 May 2015 04:57:08 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[杂谈-随笔]]></category>
		<category><![CDATA[RNA]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=703</guid>
		<description><![CDATA[需耗时两个月！里面网盘资料如果过期了，请直接联系我1227278128，或者我的 &#8230; <a href="http://www.bio-info-trainee.com/703.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h3>需耗时两个月！里面网盘资料如果过期了，请直接联系我1227278128，或者我的群201161227，所有的资源都可以在 <img src="file:///C:\Users\Jimmy\AppData\Local\Temp\%W@GJ$ACOF(TYDYECOKVDYB.png" alt="" /><a href="http://pan.baidu.com/s/1jIvwRD8" target="_blank">http://pan.baidu.com/s/1jIvwRD8 </a>此处找到</h3>
<p>搜索可以得到非常多的流程，我这里简单分享一些，我以前搜索到的文献。</p>
<p>&nbsp;</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/RNA-seq完整学习手册141.png"><img class="alignnone size-full wp-image-704" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/RNA-seq完整学习手册141.png" alt="RNA-seq完整学习手册141" width="554" height="332" /></a></p>
<p>北大也有讲RNA-seq的原理</p>
<p>链接：http://pan.baidu.com/s/1kTmWmv9 密码：6yaz</p>
<p>甚至，我还有个华大的培训课程！！！这可是5天的培训教程哦，好像当初还花了五千多块钱的资料！！！</p>
<p>链接：http://pan.baidu.com/s/1nt5OV5B 密码：gyul</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/RNA-seq完整学习手册294.png"><img class="alignnone size-full wp-image-705" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/RNA-seq完整学习手册294.png" alt="RNA-seq完整学习手册294" width="263" height="157" /></a></p>
<p>优酷也有视频，可以自己搜索看看</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/RNA-seq完整学习手册312.png"><img class="alignnone size-full wp-image-706" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/RNA-seq完整学习手册312.png" alt="RNA-seq完整学习手册312" width="410" height="254" /></a></p>
<p>然后还有几个pipeline，就是生信的分析流程，即使你啥都不会，按照pipeline来也不是问题啦</p>
<p>export PATH=/share/software/bin:$PATH</p>
<p>bowtie2-build ./data/GRCh37_chr21.fa  chr21</p>
<p>tophat -p 1 -G ./data/genes.gtf -o P460.thout chr21 ./data/P460_R1.fq  ./data/P460_R2.fq</p>
<p>tophat -p 1 -G ./data/genes.gtf -o C460.thout chr21 ./data/C460_R1.fq  ./data/C460_R2.fq</p>
<p>cufflinks -p 1 -o P460.clout P460.thout/accepted_hits.bam</p>
<p>cufflinks -p 1 -o C460.clout C460.thout/accepted_hits.bam</p>
<p>samtools  view  -h  P460.thout/accepted_hits.bam  &gt;  P460.thout/accepted_hits.sam</p>
<p>samtools  view  -h  C460.thout/accepted_hits.bam  &gt;  C460.thout/accepted_hits.sam</p>
<p>echo ./P460.clout/transcripts.gtf &gt; assemblies.txt</p>
<p>echo ./C460.clout/transcripts.gtf &gt;&gt; assemblies.txt</p>
<p>cuffmerge -p 1 -g ./data/genes.gtf -s ./data/GRCh37_chr21.fa  assemblies.txt</p>
<p>cuffdiff -p 1 -u merged_asm/merged.gtf  -b ./data/GRCh37_chr21.fa  -L P460,C460 -o P460-C460.diffout P460.thout/accepted_hits.bam C460.thout/accepted_hits.bam</p>
<p>samtools  index  P460.thout/accepted_hits.bam</p>
<p>samtools  index  C460.thout/accepted_hits.bam</p>
<p>&nbsp;</p>
<p>和另外一个</p>
<p>#!/bin/bash</p>
<p># Approx 75-80m to complete as a script</p>
<p>cd ~/RNA-seq</p>
<p>ls -l data</p>
<p>&nbsp;</p>
<p>tophat --help</p>
<p>&nbsp;</p>
<p>head -n 20 data/2cells_1.fastq</p>
<p>&nbsp;</p>
<p>time tophat --solexa-quals \</p>
<p>-g 2 \</p>
<p>--library-type fr-unstranded \</p>
<p>-j annotation/Danio_rerio.Zv9.66.spliceSites\</p>
<p>-o tophat/ZV9_2cells \</p>
<p>genome/ZV9 \</p>
<p>data/2cells_1.fastq data/2cells_2.fastq                  # 17m30s</p>
<p>&nbsp;</p>
<p>time tophat --solexa-quals \</p>
<p>-g 2 \</p>
<p>--library-type fr-unstranded \</p>
<p>-j annotation/Danio_rerio.Zv9.66.spliceSites\</p>
<p>-o tophat/ZV9_6h \</p>
<p>genome/ZV9 \</p>
<p>data/6h_1.fastq data/6h_2.fastq                          # 17m30s</p>
<p>&nbsp;</p>
<p>samtools index tophat/ZV9_2cells/accepted_hits.bam</p>
<p>samtools index tophat/ZV9_6h/accepted_hits.bam</p>
<p>&nbsp;</p>
<p>cufflinks --help</p>
<p>time cufflinks  -o cufflinks/ZV9_2cells_gff \</p>
<p>-G annotation/Danio_rerio.Zv9.66.gtf \</p>
<p>-b genome/Danio_rerio.Zv9.66.dna.fa \</p>
<p>-u \</p>
<p>--library-type fr-unstranded \</p>
<p>tophat/ZV9_2cells/accepted_hits.bam                  # 2m</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>time cufflinks  -o cufflinks/ZV9_6h_gff \</p>
<p>-G annotation/Danio_rerio.Zv9.66.gtf \</p>
<p>-b genome/Danio_rerio.Zv9.66.dna.fa \</p>
<p>-u \</p>
<p>--library-type fr-unstranded \</p>
<p>tophat/ZV9_6h/accepted_hits.bam                      # 2m</p>
<p>&nbsp;</p>
<p># guided assembly</p>
<p>time cufflinks  -o cufflinks/ZV9_2cells \</p>
<p>-g annotation/Danio_rerio.Zv9.66.gtf \</p>
<p>-b genome/Danio_rerio.Zv9.66.dna.fa \</p>
<p>-u \</p>
<p>--library-type fr-unstranded \</p>
<p>tophat/ZV9_2cells/accepted_hits.bam                  # 16m</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>time cufflinks  -o cufflinks/ZV9_6h \</p>
<p>-g annotation/Danio_rerio.Zv9.66.gtf \</p>
<p>-b genome/Danio_rerio.Zv9.66.dna.fa \</p>
<p>-u \</p>
<p>--library-type fr-unstranded \</p>
<p>tophat/ZV9_6h/accepted_hits.bam                      # 13m</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>time cuffdiff -o cuffdiff/ \</p>
<p>-L ZV9_2cells,ZV9_6h \</p>
<p>-T \</p>
<p>-b genome/Danio_rerio.Zv9.66.dna.fa \</p>
<p>-u \</p>
<p>--library-type fr-unstranded \</p>
<p>annotation/Danio_rerio.Zv9.66.gtf \</p>
<p>tophat/ZV9_2cells/accepted_hits.bam \</p>
<p>tophat/ZV9_6h/accepted_hits.bam                        # 7m</p>
<p>&nbsp;</p>
<p>head -n 20 cuffdiff/gene_exp.diff</p>
<p>&nbsp;</p>
<p>sort -t$'\t' -g -k 13 cuffdiff/gene_exp.diff \</p>
<p>&gt; cuffdiff/gene_exp_qval.sorted.diff</p>
<p>&nbsp;</p>
<p>head -n 20 cuffdiff/gene_exp_qval.sorted.diff</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/703.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>转录组总结</title>
		<link>http://www.bio-info-trainee.com/377.html</link>
		<comments>http://www.bio-info-trainee.com/377.html#comments</comments>
		<pubDate>Thu, 19 Mar 2015 14:22:12 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[杂谈-随笔]]></category>
		<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=377</guid>
		<description><![CDATA[网站成立也快一个月了，总算是完全搞定了生信领域的一个方向，当然，只是在菜鸟层面上 &#8230; <a href="http://www.bio-info-trainee.com/377.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>网站成立也快一个月了，总算是完全搞定了生信领域的一个方向，当然，只是在菜鸟层面上的搞定，还有很多深层次的应用及挖掘，仅仅是我所讲解的这些软件也有多如羊毛的参数可以变幻，复杂的很。其实我最擅长的并不是转录组，但是因为一些特殊的原因，我恰好做了三个转录组项目，所以手头上关于它的资料比较多，就分享给大家啦！稍后我会列一个网站更新计划，就好谈到我所擅长的基因组及免疫组库。我这里简单对转录组做一个总结：</p>
<p>首先当然是我的转录组分类网站啦</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组总结317.png"><img class="alignnone size-full wp-image-378" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组总结317.png" alt="转录组总结317" width="553" height="283" /></a></p>
<p><a href="http://www.bio-info-trainee.com/?cat=18">http://www.bio-info-trainee.com/?cat=18</a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>同样的我用脚本总结一下给大家</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组总结335.png"><img class="alignnone size-full wp-image-379" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组总结335.png" alt="转录组总结335" width="553" height="142" /></a></p>
<p>&nbsp;</p>
<p>http://www.bio-info-trainee.com/?p=370阅读更多关于《转录组-GO和KEGG富集的R包clusterProfiler》</p>
<p>http://www.bio-info-trainee.com/?p=359阅读更多关于《转录组-GO通路富集-WEGO网站使用》</p>
<p>http://www.bio-info-trainee.com/?p=346阅读更多关于《转录组-TransDecoder-对trinity结果进行注释》</p>
<p>http://www.bio-info-trainee.com/?p=271阅读更多关于《转录组cummeRbund操作笔记》</p>
<p>http://www.bio-info-trainee.com/?p=255阅读更多关于《转录组edgeR分析差异基因》</p>
<p>http://www.bio-info-trainee.com/?p=244阅读更多关于《转录组HTseq对基因表达量进行计数》</p>
<p>http://www.bio-info-trainee.com/?p=166阅读更多关于《转录组cufflinks套装的使用》</p>
<p>http://www.bio-info-trainee.com/?p=156阅读更多关于《转录组比对软件tophat的使用》</p>
<p>http://www.bio-info-trainee.com/?p=125阅读更多关于《Trinity进行转录组组装的使用说明》</p>
<p>http://www.bio-info-trainee.com/?p=113阅读更多关于《RSeQC对 RNA-seq数据质控》</p>
<p>同时我也讲了如何下载数据</p>
<p>http://www.bio-info-trainee.com/?p=32</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组总结1058.png"><img class="alignnone size-full wp-image-380" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组总结1058.png" alt="转录组总结1058" width="554" height="175" /></a></p>
<p>&nbsp;</p>
<p>原始SRA数据首先用SRAtoolkit数据解压，然后进行过滤，评估质量，然后trinity组装，然后对组装好的进行注释，然后走另一条路进行差异基因，差异基因有tophat+cufflinks+cummeRbund，也有HTseq 和edgeR等等，然后是GO和KEGG通路注释，等等。</p>
<p>在我的群里面共享了所有的代码及帖子内容，欢迎加群201161227，生信菜鸟团！</p>
<p>http://www.bio-info-trainee.com/?p=1</p>
<p>线下交流-生物信息学<br />
同时欢迎下载使用我的手机安卓APP</p>
<p>http://www.cutt.com/app/down/840375</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/377.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>转录组cummeRbund操作笔记</title>
		<link>http://www.bio-info-trainee.com/271.html</link>
		<comments>http://www.bio-info-trainee.com/271.html#comments</comments>
		<pubDate>Tue, 17 Mar 2015 01:34:16 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[生信组学技术]]></category>
		<category><![CDATA[计算机基础]]></category>
		<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[cummeRbund]]></category>
		<category><![CDATA[差异基因]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=271</guid>
		<description><![CDATA[转录组cummeRbund操作笔记 这是跟tophat和cufflinks套装紧 &#8230; <a href="http://www.bio-info-trainee.com/271.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p style="text-align: center;"><b>转录组</b><b>cummeRbund操作笔记</b></p>
<p>这是跟tophat和cufflinks套装紧密搭配使用的一个R包，能出大部分文章要求的标准化图片。</p>
<p>一：安装并加装该R包</p>
<p>安装就用source("http://bioconductor.org/biocLite.R") ;biocLite("cummeRbund")即可，如果安装失败，就需要自己下载源码包，然后安装R模块。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记220.png"><img class="alignnone size-full wp-image-272" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记220.png" alt="转录组cummeRbund操作笔记220" width="554" height="199" /></a></p>
<p>然后把cuffdiff输出的文件目录拷贝到R的工作目录，或者自己设置工作目录</p>
<p>&nbsp;</p>
<p>二：读取FN目录下面的所有文件。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记239.png"><img class="alignnone size-full wp-image-273" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记239.png" alt="转录组cummeRbund操作笔记239" width="466" height="181" /></a></p>
<p>可以看到把cuffdiff下面的文件夹所有的文件都读取到了，里面有如下文件，包括genes，isoforms，cds，tss这四种差异情况都读取了。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记316.png"><img class="alignnone size-full wp-image-274" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记316.png" alt="转录组cummeRbund操作笔记316" width="518" height="254" /></a></p>
<p>&nbsp;</p>
<p>三：表达水平分布图</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记328.png"><img class="alignnone size-full wp-image-275" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记328.png" alt="转录组cummeRbund操作笔记328" width="526" height="63" /></a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记330.png"><img class="alignnone size-full wp-image-276" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记330.png" alt="转录组cummeRbund操作笔记330" width="553" height="419" /></a><br />
四、表达水平箱线图</p>
<p>csBoxplot(genes(cuff_data))</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记371.png"><img class="alignnone size-full wp-image-277" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记371.png" alt="转录组cummeRbund操作笔记371" width="554" height="422" /></a><br />
五、画基因表达差异热图</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记386.png"><img class="alignnone size-full wp-image-278" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记386.png" alt="转录组cummeRbund操作笔记386" width="511" height="617" /></a></p>
<p>画出热图如下</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记396.png"><img class="alignnone size-full wp-image-279" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记396.png" alt="转录组cummeRbund操作笔记396" width="475" height="347" /></a></p>
<p>&nbsp;</p>
<p>六、得到差异的genes,isoforms,TSS,CDS等等</p>
<p>&nbsp;</p>
<ul>
<li>得到上调下调基因列表</li>
</ul>
<p>diffData &lt;- diffData(myGenes )</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记430.png"><img class="alignnone size-full wp-image-280" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记430.png" alt="转录组cummeRbund操作笔记430" width="554" height="171" /></a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记474.png"><img class="alignnone size-full wp-image-281" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记474.png" alt="转录组cummeRbund操作笔记474" width="554" height="134" /></a></p>
<p>&nbsp;</p>
<p>只有一百个有表达差异的基因</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记490.png"><img class="alignnone size-full wp-image-282" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记490.png" alt="转录组cummeRbund操作笔记490" width="212" height="121" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>最后贴出一个综合性的代码，算了，太浪费空间了，把整个空间搞得不好看，就不贴了。</p>
<p>这个代码可以自动运行出图;</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记3781.png"><img class="alignnone size-full wp-image-283" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/03/转录组cummeRbund操作笔记3781.png" alt="转录组cummeRbund操作笔记3781" width="554" height="384" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/271.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
