<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 全外显子组软件</title>
	<atom:link href="http://www.bio-info-trainee.com/category/omics/exon/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>外显子测序流程-文章里面的</title>
		<link>http://www.bio-info-trainee.com/2838.html</link>
		<comments>http://www.bio-info-trainee.com/2838.html#comments</comments>
		<pubDate>Tue, 14 Nov 2017 07:11:34 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[全外显子组软件]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2838</guid>
		<description><![CDATA[就是做一个图床而已，需要这个图片的网页url链接，没别的意思！ 一、质控（fas &#8230; <a href="http://www.bio-info-trainee.com/2838.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>就是做一个图床而已，需要这个图片的网页url链接，没别的意思！<span id="more-2838"></span></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2017/11/wes-data-analysis-workflow.jpeg"><img class="alignnone size-full wp-image-2839" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/11/wes-data-analysis-workflow.jpeg" alt="wes-data-analysis-workflow" width="1638" height="1574" /></a></p>
<p>一、质控（fastqc +tookit）</p>
<p>1数据质量：</p>
<p>1）碱基质量分布</p>
<p>2）reads质量分布</p>
<p>3）reads长度分布</p>
<p>4）GC含量</p>
<p>&nbsp;</p>
<p>2数据过滤</p>
<p>1）原始reads数</p>
<p>2）平均质量值&gt;Q20 reads数目和比例</p>
<p>3）平均质量值&gt;Q30 reads数目和比例</p>
<p>4）过滤掉reads中碱基质量&lt;Q20的碱基占比超过5%的reads。统计clean data的reads和比例。</p>
<p>&nbsp;</p>
<h3>二、比对（bwa）</h3>
<p>1）比对上基因组的reads数及占总数的比例</p>
<p>2）完全匹配的reads数</p>
<p>3）匹配上各个染色体的reads数</p>
<p>4）染色体上的覆盖深度</p>
<p>5）落在目标区域（exon）的reads数</p>
<p>6）落在目标区域+-100的reads数</p>
<p>7）目标区域碱基覆盖深度</p>
<p>8）目标区域碱基被覆盖比例</p>
<p>9）目标区域碱基被覆盖（50X，100X，150X，200X。。。）的比例</p>
<p>&nbsp;</p>
<h3>三、find SNV（samtools +picard+gatk+varscan）</h3>
<p>1）picard ：sam &gt;sort.bam</p>
<p>2）gatk ：sort.bam &gt;sort.dedup.bam (去重复)</p>
<p>3）gatk ：sort.dedup.bam &gt; realign.bam (重新比对，indel和snp校正)</p>
<p>4）Gatk ：碱基质量重打分。（未进行）</p>
<p>5）Varscan ：call SNV</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>四、突变注释</h3>
<p>1）annovar注释。</p>
<p>2）注释结果统计（同义，非同义突变，基因上下游，内含子，外显子上。。等）</p>
<p>3）dbsnp 注释（找到的snp是否在dbsnp数据库上）</p>
<p>4） cosmic63 ：癌症相关突变</p>
<p>&nbsp;</p>
<h3>五、突变分析</h3>
<p>1）snv在个染色体上的分布</p>
<p>2）各基因上snv的分布</p>
<p>3）Snv位点较多的基因进行功能分析（pathway，kegg的通路分析和Go功能富集）</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2838.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WES（六）用annovar注释</title>
		<link>http://www.bio-info-trainee.com/1158.html</link>
		<comments>http://www.bio-info-trainee.com/1158.html#comments</comments>
		<pubDate>Sun, 01 Nov 2015 10:09:11 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[全外显子组软件]]></category>
		<category><![CDATA[annovar]]></category>
		<category><![CDATA[注释]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1158</guid>
		<description><![CDATA[使用annovar软件参考自：http://www.bio-info-train &#8230; <a href="http://www.bio-info-trainee.com/1158.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>使用annovar软件参考自：<a href="http://www.bio-info-trainee.com/?p=641">http://www.bio-info-trainee.com/?p=641</a></p>
<p>/home/jmzeng/bio-soft/annovar/convert2annovar.pl -format vcf4  Sample3.varscan.snp.vcf &gt; Sample3.annovar</p>
<p>/home/jmzeng/bio-soft/annovar/convert2annovar.pl -format vcf4  Sample4.varscan.snp.vcf &gt; Sample4.annovar</p>
<p>/home/jmzeng/bio-soft/annovar/convert2annovar.pl -format vcf4  Sample5.varscan.snp.vcf &gt; Sample5.annovar</p>
<p>然后用下面这个脚本批量注释</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0016.png"><img class="alignnone size-full wp-image-1160" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0016.png" alt="image001" width="442" height="243" /></a></p>
<p>Reading gene annotation from /home/jmzeng/bio-soft/annovar/humandb/hg19_refGene.txt ... Done with 50914 transcripts (including 11516 without coding sequence annotation) for 26271 unique genes</p>
<p>最后查看结果可知，真正在外显子上面的突变并不多</p>
<p>23515 Sample3.anno.exonic_variant_function</p>
<p>23913 Sample4.anno.exonic_variant_function</p>
<p>24009 Sample5.anno.exonic_variant_function</p>
<p>annovar软件就是把我们得到的十万多个snp分类了，看看这些snp分别是基因的哪些位置，是否引起蛋白突变</p>
<p>downstream</p>
<p>exonic</p>
<p>exonic;splicing</p>
<p>intergenic</p>
<p>intronic</p>
<p>ncRNA_exonic</p>
<p>ncRNA_intronic</p>
<p>ncRNA_splicing</p>
<p>ncRNA_UTR3</p>
<p>ncRNA_UTR5</p>
<p>splicing</p>
<p>upstream</p>
<p>upstream;downstream</p>
<p>UTR3</p>
<p>UTR5</p>
<p>UTR5;UTR3</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1158.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WES（五）不同软件比较</title>
		<link>http://www.bio-info-trainee.com/1150.html</link>
		<comments>http://www.bio-info-trainee.com/1150.html#comments</comments>
		<pubDate>Sun, 01 Nov 2015 10:07:44 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[全外显子组软件]]></category>
		<category><![CDATA[比较]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1150</guid>
		<description><![CDATA[主要是画韦恩图看看，参考：http://www.bio-info-trainee &#8230; <a href="http://www.bio-info-trainee.com/1150.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>主要是画韦恩图看看，参考：<a href="http://www.bio-info-trainee.com/?p=893">http://www.bio-info-trainee.com/?p=893</a></p>
<p>对合并而且过滤的高质量snp信息来看看四种不同的snp calling软件的差异</p>
<p>我们用R语言来画韦恩图</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0015.png"><img class="alignnone size-full wp-image-1151" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0015.png" alt="image001" width="769" height="426" /></a></p>
<p>可以看出不同软件的差异还是蛮大的，所以我只选四个软件的公共snp来进行分析</p>
<p>首先是sample3</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0025.png"><img class="alignnone size-full wp-image-1152" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0025.png" alt="image002" width="691" height="582" /></a></p>
<p>然后是sample4</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0035.png"><img class="alignnone size-full wp-image-1153" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0035.png" alt="image003" width="688" height="577" /></a></p>
<p>然后是sample5</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0045.png"><img class="alignnone size-full wp-image-1154" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0045.png" alt="image004" width="691" height="574" /></a></p>
<p>可以看出，不同的软件差异还是蛮大的，所以我重新比较了一下，这次只比较，它们不同的软件在exon位点上面的snp的差异，毕竟，我们这次是外显子测序，重点应该是外显子snp</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0053.png"><img class="alignnone size-full wp-image-1155" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0053.png" alt="image005" width="606" height="314" /></a></p>
<p>然后我们用同样的程序，画韦恩图，这次能明显看出来了，大部分的snp位点都至少有两到三个软件支持</p>
<p>所以，只有测序深度达到一定级别，用什么软件来做snp-calling其实影响并不大。</p>
<p>&nbsp;</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0061.png"><img class="alignnone size-full wp-image-1156" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0061.png" alt="image006" width="689" height="569" /></a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1150.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WES（四）不同个体的比较</title>
		<link>http://www.bio-info-trainee.com/1138.html</link>
		<comments>http://www.bio-info-trainee.com/1138.html#comments</comments>
		<pubDate>Sun, 01 Nov 2015 10:05:25 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[全外显子组软件]]></category>
		<category><![CDATA[个体]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1138</guid>
		<description><![CDATA[3-4-5分别就是孩子、父亲、母亲 我对每个个体取他们的四种软件的公共snp来进 &#8230; <a href="http://www.bio-info-trainee.com/1138.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>3-4-5分别就是孩子、父亲、母亲</p>
<p>我对每个个体取他们的四种软件的公共snp来进行分析，并且只分析基因型，看看是否符合孟德尔遗传定律</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0014.png"><img class="alignnone  wp-image-1144" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0014.png" alt="image001" width="653" height="555" /></a></p>
<p>结果如下：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0024.png"><img class="alignnone size-full wp-image-1145" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0024.png" alt="image002" width="522" height="203" /></a></p>
<p>粗略看起来好像很少不符合孟德尔遗传定律耶</p>
<p>然后我写了程序计算</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0034.png"><img class="alignnone size-full wp-image-1146" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0034.png" alt="image003" width="546" height="158" /></a></p>
<p>总共127138个可以计算的位点，共有18063个位点不符合孟德尔遗传定律，而且它们在染色体的分布情况如下</p>
<p>我检查了一下，不符合的原因，发现我把</p>
<p>chr1 100617887 C T:DP4=0,0,36,3 T:1/1:40 T:1/1:0,40:40 miss T:DP4=0,0,49,9 T:1/1:59 T:1/1:0,58:59 miss T:DP4=0,0,43,8 T:1/1:53 T:1/1:0,53:53 T:1/1:50</p>
<p>计算成了chr1 100617887 C 0/0 0/0 1/1 所以认为不符合，因为我认为只有四个软件都认为是snp的我才当作是snp的基因型，否则都是0/0</p>
<p>那么我就改写了程序，全部用gatk结果来计算。这次可以计算的snp有个176036，不符合的有20309，而且我看了不符合的snp的染色体分布，Y染色体有点异常</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0044.png"><img class="alignnone size-full wp-image-1147" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0044.png" alt="image004" width="784" height="85" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0052.png"><img class="alignnone size-full wp-image-1148" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0052.png" alt="image005" width="786" height="84" /></a></p>
<p>但是很失败，没什么发现！</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1138.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WES（三）snp-filter</title>
		<link>http://www.bio-info-trainee.com/1137.html</link>
		<comments>http://www.bio-info-trainee.com/1137.html#comments</comments>
		<pubDate>Sun, 01 Nov 2015 10:02:59 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[全外显子组软件]]></category>
		<category><![CDATA[snp]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1137</guid>
		<description><![CDATA[其中freebayes,bcftools,gatk都是把所有的snp细节都cal &#8230; <a href="http://www.bio-info-trainee.com/1137.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>其中freebayes,bcftools,gatk都是把所有的snp细节都call出来了，可以看到下面这些软件的结果有的高达一百多万个snp，而一般文献都说外显子组测序可鉴定约8万个变异！</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0013.png"><img class="alignnone size-full wp-image-1139" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0013.png" alt="image001" width="410" height="305" /></a></p>
<p>这样得到突变太多了，所以需要过滤。这里过滤的统一标准都是qual大于20，测序深度大于10。过滤之后的snp数量如下</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0023.png"><img class="alignnone size-full wp-image-1140" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0023.png" alt="image002" width="413" height="241" /></a></p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next if /INDEL/;/(DP4=.*?);/;print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$1"}' Sample3.bcftools.vcf &gt;Sample3.bcftools.vcf.filter</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next if /INDEL/;/(DP4=.*?);/;print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$1"}' Sample4.bcftools.vcf &gt;Sample4.bcftools.vcf.filter</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next if /INDEL/;/(DP4=.*?);/;print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$1"}' Sample5.bcftools.vcf &gt;Sample5.bcftools.vcf.filter</p>
<p>&nbsp;</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next unless /TYPE=snp/;@tmp=split/:/,$F[9];print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[1]"}'  Sample3.freebayes.vcf &gt; Sample3.freebayes.vcf.filter</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next unless /TYPE=snp/;@tmp=split/:/,$F[9];print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[1]"}'  Sample4.freebayes.vcf &gt; Sample4.freebayes.vcf.filter</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next unless /TYPE=snp/;@tmp=split/:/,$F[9];print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[1]"}'  Sample5.freebayes.vcf &gt; Sample5.freebayes.vcf.filter</p>
<p>&nbsp;</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next if length($F[3]) &gt;1;next if length($F[4]) &gt;1;@tmp=split/:/,$F[9];print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[1]:$tmp[2]"}'  Sample3.gatk.UG.vcf  &gt;Sample3.gatk.UG.vcf.filter</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next if length($F[3]) &gt;1;next if length($F[4]) &gt;1;@tmp=split/:/,$F[9];print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[1]:$tmp[2]"}'  Sample4.gatk.UG.vcf  &gt;Sample4.gatk.UG.vcf.filter</p>
<p>perl -alne '{next if $F[5]&lt;20;/DP=(\d+)/;next if $1&lt;10;next if length($F[3]) &gt;1;next if length($F[4]) &gt;1;@tmp=split/:/,$F[9];print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[1]:$tmp[2]"}'  Sample5.gatk.UG.vcf  &gt;Sample5.gatk.UG.vcf.filter</p>
<p>&nbsp;</p>
<p>perl -alne '{@tmp=split/:/,$F[9];next if $tmp[3]&lt;10;print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[3]"}' Sample3.varscan.snp.vcf &gt;Sample3.varscan.snp.vcf.filter</p>
<p>perl -alne '{@tmp=split/:/,$F[9];next if $tmp[3]&lt;10;print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[3]"}' Sample4.varscan.snp.vcf &gt;Sample4.varscan.snp.vcf.filter</p>
<p>perl -alne '{@tmp=split/:/,$F[9];next if $tmp[3]&lt;10;print "$F[0]\t$F[1]\t$F[3]\t$F[4]:$tmp[0]:$tmp[3]"}' Sample5.varscan.snp.vcf &gt;Sample5.varscan.snp.vcf.filter</p>
<p>这样不同工具产生的snp记录数就比较整齐了，我们先比较四种不同工具的call snp的情况，然后再比较三个人的区别。</p>
<p>然后写了一个程序把所有的snp合并起来比较</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0033.png"><img class="alignnone size-full wp-image-1141" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0033.png" alt="image003" width="493" height="562" /></a></p>
<p>得到了一个很有趣的表格，我放在excel里面看了看 ，主要是要看生物学意义，但是我的生物学知识好多都忘了，得重新学习了 <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0043.png"><img class="alignnone size-full wp-image-1142" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0043.png" alt="image004" width="955" height="454" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1137.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WES（二）snp-calling</title>
		<link>http://www.bio-info-trainee.com/1114.html</link>
		<comments>http://www.bio-info-trainee.com/1114.html#comments</comments>
		<pubDate>Sun, 01 Nov 2015 10:00:37 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[全外显子组软件]]></category>
		<category><![CDATA[snp]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1114</guid>
		<description><![CDATA[准备文件：下载必备的软件和参考基因组数据 1、软件 ps：还有samtools， &#8230; <a href="http://www.bio-info-trainee.com/1114.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>准备文件：下载必备的软件和参考基因组数据</p>
<p>1、软件</p>
<p>ps：还有samtools，freebayes和varscan软件，我以前下载过，这次就没有再弄了，但是下面会用到</p>
<p>2、参考基因组</p>
<p>3、参考 突变数据</p>
<p>第一步，下载数据</p>
<p>第二步，bwa比对</p>
<p>第三步，sam转为bam，并sort好</p>
<p>第四步，标记PCR重复，并去除</p>
<p>第五步，产生需要重排的坐标记录</p>
<p>第六步，根据重排记录文件把比对结果重新比对</p>
<p>第七步，把最终的bam文件转为mpileup文件</p>
<p>第八步，用bcftools 来call snp</p>
<p>第九步，用freebayes来call snp</p>
<p>第十步，用gatk     来call snp</p>
<p>第十一步，用varscan来call snp</p>
<p>下面的图片是按照顺序来的，我就不整理了</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0012.png"><img class="alignnone size-full wp-image-1115" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0012.png" alt="image001" width="762" height="252" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0022.png"><img class="alignnone size-full wp-image-1116" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0022.png" alt="image002" width="411" height="162" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0032.png"><img class="alignnone size-full wp-image-1117" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0032.png" alt="image003" width="677" height="128" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0042.png"><img class="alignnone size-full wp-image-1118" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0042.png" alt="image004" width="505" height="126" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0051.png"><img class="alignnone size-full wp-image-1119" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0051.png" alt="image005" width="518" height="327" /></a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image006.png"><img class="alignnone size-full wp-image-1120" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image006.png" alt="image006" width="298" height="63" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image007.png"><img class="alignnone size-full wp-image-1121" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image007.png" alt="image007" width="589" height="389" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image008.png"><img class="alignnone size-full wp-image-1122" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image008.png" alt="image008" width="385" height="129" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image009.png"><img class="alignnone size-full wp-image-1123" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image009.png" alt="image009" width="783" height="222" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image010.png"><img class="alignnone size-full wp-image-1124" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image010.png" alt="image010" width="368" height="126" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image011.png"><img class="alignnone size-full wp-image-1125" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image011.png" alt="image011" width="871" height="256" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image012.png"><img class="alignnone size-full wp-image-1126" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image012.png" alt="image012" width="356" height="67" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image013.png"><img class="alignnone size-full wp-image-1127" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image013.png" alt="image013" width="783" height="241" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image014.png"><img class="alignnone size-full wp-image-1128" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image014.png" alt="image014" width="376" height="129" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image015.png"><img class="alignnone size-full wp-image-1129" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image015.png" alt="image015" width="704" height="122" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image016.png"><img class="alignnone size-full wp-image-1130" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image016.png" alt="image016" width="385" height="68" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image017.png"><img class="alignnone size-full wp-image-1131" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image017.png" alt="image017" width="726" height="157" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image018.png"><img class="alignnone size-full wp-image-1132" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image018.png" alt="image018" width="562" height="137" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image019.png"><img class="alignnone size-full wp-image-1133" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image019.png" alt="image019" width="469" height="288" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image020.png"><img class="alignnone size-full wp-image-1134" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image020.png" alt="image020" width="715" height="151" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image021.png"><img class="alignnone size-full wp-image-1135" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image021.png" alt="image021" width="445" height="309" /></a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1114.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WES（一）测序质量控制</title>
		<link>http://www.bio-info-trainee.com/1108.html</link>
		<comments>http://www.bio-info-trainee.com/1108.html#comments</comments>
		<pubDate>Sun, 01 Nov 2015 09:58:13 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[全外显子组软件]]></category>
		<category><![CDATA[QC]]></category>
		<category><![CDATA[WES]]></category>
		<category><![CDATA[测序质量控制]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1108</guid>
		<description><![CDATA[这一步主要看看这些外显子测序数据的测序质量如何： 首先用fastqc处理，会出一 &#8230; <a href="http://www.bio-info-trainee.com/1108.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这一步主要看看这些外显子测序数据的测序质量如何：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0011.png"><img class="alignnone size-full wp-image-1109" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0011.png" alt="image001" width="505" height="126" /></a><span id="more-1108"></span></p>
<p>首先用fastqc处理，会出一些图表，肯定是没问题的啦，如果数据有问题，公司就不会给你，那样不砸了他们自己的招牌嘛。</p>
<p>然后我们粗略统计下平均测序深度及目标区域覆盖度，这个是重点，不过一般没问题的，因为现在芯片捕获技术非常成熟了，而且实验水平大幅提升，没有以前那么多的问题了。</p>
<p>这个外显子项目的测序文件里面，mpileup文件是1371416525行，意味着总的测序长度是1.3G，以前我接触的一般是600M左右的<br />
因为外显子目标区域并不大，就34729283bp，也就是约35M。</p>
<p>即使加上侧翼长度</p>
<p>54692160 外显子加上前后50bp</p>
<p>73066288  外显子加上前后100bp</p>
<p>90362533  外显子加上前后150bp</p>
<p>然后我要根据外显子记录文件对mpileup文件进行计数，统计外显子coverage，还有测序深度，这个脚本其实蛮有难度的！</p>
<p>&nbsp;</p>
<p>我前面提到过外显子组的序列仅占全基因组序列的1%左右，而我在NCBI里面拿到 consensus coding sequence (CCDS)记录CCDS.20150512.txt文件，是基于hg38版本的，需要首先转换成hg19才可以来计算这次测序项目的覆盖度和平均测序深度。</p>
<p>参考：<a href="http://www.bio-info-trainee.com/?p=990">http://www.bio-info-trainee.com/?p=990</a> （ liftover基因组版本之间的coordinate转换）</p>
<p><strong> awk '{print "chr"$3,$4,$5,$1,0,$2,$4,$5,"255,0,0"}' CCDS.20150512.exon.txt &gt;CCDS.20150512.exon.hg38.bed</strong></p>
<p><strong>~/bio-soft/liftover/liftOver CCDS.20150512.exon.hg38.bed ~/bio-soft/liftover/hg38ToHg19.over.chain CCDS.20150512.exon.hg19.bed unmap</strong></p>
<p>下面这个程序就是读取转换好的外显子记录的数据，对一家三口一起统计，然后再读取每个样本的20G左右的mpileup文件，进行统计，所以很耗费时间。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0021.png"><img class="alignnone size-full wp-image-1110" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0021.png" alt="image002" width="656" height="582" /></a></p>
<p>外显子目标区域平均测序深度接近100X，所以很明显是非常好的捕获效率啦！而全基因组背景深度才3.3，这 符合实验原理, 即与探针杂交碱基多的片段比少的片段更易被捕获. 对非特异杂交的,基因组覆盖度非特异的背景 DNA 也进行了测序。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0031.png"><img class="alignnone size-full wp-image-1111" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0031.png" alt="image003" width="471" height="339" /></a></p>
<p>接下来对测序深度进行简单统计，脚本如下，但是这个图没多大意思。因为我们的外显子的35M区域平均都接近100X的测序量</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0041.png"><img class="alignnone size-full wp-image-1112" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/11/image0041.png" alt="image004" width="511" height="300" /></a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1108.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
