<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 变异</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e5%8f%98%e5%bc%82/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>找变异的流程</title>
		<link>http://www.bio-info-trainee.com/2790.html</link>
		<comments>http://www.bio-info-trainee.com/2790.html#comments</comments>
		<pubDate>Mon, 30 Oct 2017 02:55:56 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[变异]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2790</guid>
		<description><![CDATA[找变异简单点说，就是把高通量测序得到的成千上万条序列片段比对到合适的参考基因组， &#8230; <a href="http://www.bio-info-trainee.com/2790.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div class="markdown-here-wrapper" data-md-url="http://www.bio-info-trainee.com/wp-admin/post-new.php">
<p style="margin: 0px 0px 1.2em !important;">找变异简单点说，就是把高通量测序得到的成千上万条序列片段比对到合适的参考基因组，找到那些成</p>
<p style="margin: 0px 0px 1.2em !important;">功比对的片段与参考基因组的微小差异情况。 那么就涉及到存储测序数据的fastq数据格式，比对的工具，比对后的sam格式，找微小差异的工具，差异结果的vcf文件，每个步骤的软件选择，参数 调整。当然，最重要的是走通整个流程，明白自己在做什么。</p>
<p style="margin: 0px 0px 1.2em !important;"><span id="more-2790"></span></p>
<h1 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.6em; border-bottom: 1px solid #dddddd;">一个模拟项目</h1>
<ul style="margin: 1.2em 0px; padding-left: 2em;">
<li style="margin: 0.5em 0px;">首先下载X,Y染色体的fasta序列，在UCSC上面下载即可。</li>
<li style="margin: 0.5em 0px;">然后把X染色体构建bwa的索引</li>
<li style="margin: 0.5em 0px;">接着模拟一个Y染色体的测序数据，模拟的程序很简单,模拟Y染色体的测序片段（PE100，insert400）</li>
<li style="margin: 0.5em 0px;">然后把模拟测序数据比对到X染色体的参考，统计一下比对结果。</li>
<li style="margin: 0.5em 0px;">最后对比对成功的bam文件进行找变异位点。</li>
</ul>
<p style="margin: 0px 0px 1.2em !important;">代码如下：</p>
<pre style="font-size: 1em; font-family: Consolas, Inconsolata, Courier, monospace; line-height: 1.2em; margin: 1.2em 0px;"><code style="font-size: 0.85em; font-family: Consolas, Inconsolata, Courier, monospace; margin: 0px 0.15em; padding: 0.5em 0.7em; white-space: pre; border: 1px solid #cccccc; background-color: #f8f8f8; border-radius: 3px; display: block !important; overflow: auto;">## 源代码方式安装 bwa-0.7.15 
## conda安装samtools
cd tmp/chrX_Y/hg19/
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrX.fa.gz 
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrY.fa.gz 
gunzip chrX.fa.gz
gunzip chrY.fa.gz
~/biosoft/bwa/bwa-0.7.15/bwa index chrX.fa
perl simulate.pl chrY.fa ## 这个perl脚本在 http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp.png 
~/biosoft/bwa/bwa-0.7.15/bwa mem -t 5 -M chrX.fa read*.fa &gt;read.sam
samtools view -bS read.sam &gt;read.bam
samtools flagstat read.bam
samtools sort -@ 5 -o read.sorted.bam read.bam
samtools view -h -F4 -q 5 read.sorted.bam |samtools view -bS|samtools rmdup - read.filter.rmdup.bam
samtools index read.filter.rmdup.bam
samtools mpileup -ugf ~/tmp/chrX_Y/hg19/chrX.fa read.filter.rmdup.bam |bcftools call -vmO z -o read.bcftools.vcf.gz
## 把fa/bam/vcf 载入到 IGV 进行可视化，截图其中一个变异位点
## 参考 http://www.biotrainee.com/thread-696-1-1.html
</code></pre>
<h1 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.6em; border-bottom: 1px solid #dddddd;">变异寻找的流程</h1>
<p style="margin: 0px 0px 1.2em !important;">完整的流程可以很复杂：</p>
<p style="margin: 0px 0px 1.2em !important;"><a href="http://www.bio-info-trainee.com/wp-content/uploads/2017/10/Workflow-for-pharmacogenomics-using-WES-or-WGS-After-mapping-to-the-reference-sequence.jpg"><img class="alignnone size-full wp-image-2796" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/10/Workflow-for-pharmacogenomics-using-WES-or-WGS-After-mapping-to-the-reference-sequence.jpg" alt="workflow-for-pharmacogenomics-using-wes-or-wgs-after-mapping-to-the-reference-sequence" width="600" height="439" /></a></p>
<p style="margin: 0px 0px 1.2em !important;">仅是上变异寻找流程就可以很复杂：</p>
<p style="margin: 0px 0px 1.2em !important;"><a href="http://www.bio-info-trainee.com/wp-content/uploads/2017/10/Variant-analysis-workflow-specifications.png"><img class="alignnone size-full wp-image-2793" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/10/Variant-analysis-workflow-specifications.png" alt="variant-analysis-workflow-specifications" width="570" height="519" /></a></p>
<p style="margin: 0px 0px 1.2em !important;">来自于2017年发表于BMC Bioinformatics的文章 <a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1454-2">MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants</a></p>
<div style="height: 0; width: 0; max-height: 0; max-width: 0; overflow: hidden; font-size: 0em; padding: 0; margin: 0;" title="MDH:PHA+5om+5Y+Y5byC566A5Y2V54K56K+077yM5bCx5piv5oqK6auY6YCa6YeP5rWL5bqP5b6X5Yiw
55qE5oiQ5Y2D5LiK5LiH5p2h5bqP5YiX54mH5q615q+U5a+55Yiw5ZCI6YCC55qE5Y+C6ICD5Z+6
5Zug57uE77yM5om+5Yiw6YKj5Lqb5oiQ5Yqf5q+U5a+555qE54mH5q615LiO5Y+C6ICD5Z+65Zug
57uE55qE5b6u5bCP5beu5byC5oOF5Ya144CCPC9wPjxwPiMg5LiA5Liq5qih5ouf6aG555uuPC9w
PjxwPi0g6aaW5YWI5LiL6L29WCxZ5p+T6Imy5L2T55qEZmFzdGHluo/liJfvvIzlnKhVQ1ND5LiK
6Z2i5LiL6L295Y2z5Y+v44CCIDxicj4tIOeEtuWQjuaKiljmn5PoibLkvZPmnoTlu7pid2HnmoTn
tKLlvJU8YnI+LSDmjqXnnYDmqKHmi5/kuIDkuKpZ5p+T6Imy5L2T55qE5rWL5bqP5pWw5o2u77yM
5qih5ouf55qE56iL5bqP5b6I566A5Y2VLOaooeaLn1nmn5PoibLkvZPnmoTmtYvluo/niYfmrrXv
vIhQRTEwMO+8jGluc2VydDQwMO+8iSA8YnI+LSDnhLblkI7miormqKHmi5/mtYvluo/mlbDmja7m
r5Tlr7nliLBY5p+T6Imy5L2T55qE5Y+C6ICD77yM57uf6K6h5LiA5LiL5q+U5a+557uT5p6c44CC
PGJyPi0g5pyA5ZCO5a+55q+U5a+55oiQ5Yqf55qEYmFt5paH5Lu26L+b6KGM5om+5Y+Y5byC5L2N
54K544CCPC9wPjxwPuS7o+eggeWmguS4i++8mjwvcD48cD5gYGA8YnI+IyMg5rqQ5Luj56CB5pa5
5byP5a6J6KOFIGJ3YS0wLjcuMTUgPGJyPiMjIGNvbmRh5a6J6KOFc2FtdG9vbHM8YnI+Y2QgdG1w
L2NoclhfWS9oZzE5Lzxicj53Z2V0IGh0dHA6Ly9oZ2Rvd25sb2FkLmNzZS51Y3NjLmVkdS9nb2xk
ZW5QYXRoL2hnMTkvY2hyb21vc29tZXMvY2hyWC5mYS5neiA8YnI+d2dldCBodHRwOi8vaGdkb3du
bG9hZC5jc2UudWNzYy5lZHUvZ29sZGVuUGF0aC9oZzE5L2Nocm9tb3NvbWVzL2NoclkuZmEuZ3og
PGJyPmd1bnppcCBjaHJYLmZhLmd6PGJyPmd1bnppcCBjaHJZLmZhLmd6PGJyPn4vYmlvc29mdC9i
d2EvYndhLTAuNy4xNS9id2EgaW5kZXggY2hyWC5mYTxicj5wZXJsIHNpbXVsYXRlLnBsIGNoclku
ZmEgIyMg6L+Z5LiqcGVybOiEmuacrOWcqCBodHRwOi8vd3d3LmJpby1pbmZvLXRyYWluZWUuY29t
L3dwLWNvbnRlbnQvdXBsb2Fkcy8yMDE1LzEwL3RtcC5wbmcgPGJyPn4vYmlvc29mdC9id2EvYndh
LTAuNy4xNS9id2EgbWVtIC10IDUgLU0gY2hyWC5mYSByZWFkKi5mYSAmZ3Q7cmVhZC5zYW08YnI+
c2FtdG9vbHMgdmlldyAtYlMgcmVhZC5zYW0gJmd0O3JlYWQuYmFtPGJyPnNhbXRvb2xzIGZsYWdz
dGF0IHJlYWQuYmFtPGJyPnNhbXRvb2xzIHNvcnQgLUAgNSAtbyByZWFkLnNvcnRlZC5iYW0gcmVh
ZC5iYW08YnI+c2FtdG9vbHMgdmlldyAtaCAtRjQgLXEgNSByZWFkLnNvcnRlZC5iYW0gfHNhbXRv
b2xzIHZpZXcgLWJTfHNhbXRvb2xzIHJtZHVwIC0gcmVhZC5maWx0ZXIucm1kdXAuYmFtPGJyPnNh
bXRvb2xzIGluZGV4IHJlYWQuZmlsdGVyLnJtZHVwLmJhbTxicj5zYW10b29scyBtcGlsZXVwIC11
Z2Ygfi90bXAvY2hyWF9ZL2hnMTkvY2hyWC5mYSByZWFkLmZpbHRlci5ybWR1cC5iYW0gfGJjZnRv
b2xzIGNhbGwgLXZtTyB6IC1vIHJlYWQuYmNmdG9vbHMudmNmLmd6PGJyPiMjIOaKimZhL2JhbS92
Y2Yg6L295YWl5YiwIElHViDov5vooYzlj6/op4bljJbvvIzmiKrlm77lhbbkuK3kuIDkuKrlj5jl
vILkvY3ngrk8YnI+IyMg5Y+C6ICDIGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTY5
Ni0xLTEuaHRtbDxicj5gYGA8L3A+PHA+IyDlj5jlvILlr7vmib7nmoTmtYHnqIs8L3A+PHA+5a6M
5pW055qE5rWB56iL5Y+v5Lul5b6I5aSN5p2C77yaPC9wPjxwPiFb5a6M5pW055qE5rWB56iLXShp
bWFnZS9Xb3JrZmxvdy1mb3ItcGhhcm1hY29nZW5vbWljcy11c2luZy1XRVMtb3ItV0dTLUFmdGVy
LW1hcHBpbmctdG8tdGhlLXJlZmVyZW5jZS1zZXF1ZW5jZS5wbmcpPC9wPjxwPuS7heS7heaYr+S4
iua4uOeahOWPmOW8guWvu+aJvua1geeoi+WwseWPr+S7peW+iOWkjeadgu+8mjwvcD48cD4hW+S4
iua4uOa1geeoi+ivpue7huWMll0oaW1hZ2UvVmFyaWFudC1hbmFseXNpcy13b3JrZmxvdy1zcGVj
aWZpY2F0aW9ucy5wbmcpPC9wPjxwPuadpeiHquS6jjIwMTflubTlj5Hooajkuo5CTUMgQmlvaW5m
b3JtYXRpY3PnmoTmlofnq6AgW01DLUdlbm9tZUtleTogYSBtdWx0aWNsb3VkIHN5c3RlbSBmb3Ig
dGhlIGRldGVjdGlvbiBhbmQgYW5ub3RhdGlvbiBvZiBnZW5vbWljIHZhcmlhbnRzXShodHRwczov
L2JtY2Jpb2luZm9ybWF0aWNzLmJpb21lZGNlbnRyYWwuY29tL2FydGljbGVzLzEwLjExODYvczEy
ODU5LTAxNi0xNDU0LTIpPGJyPjwvcD48cD48YnIgZGF0YS1tY2UtYm9ndXM9IjEiPjwvcD48cD48
YnIgZGF0YS1tY2UtYm9ndXM9IjEiPjwvcD48cD48YnIgZGF0YS1tY2UtYm9ndXM9IjEiPjwvcD4=">​</div>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2790.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>【直播】我的基因组（十二）:先粗略看看几个基因吧</title>
		<link>http://www.bio-info-trainee.com/2116.html</link>
		<comments>http://www.bio-info-trainee.com/2116.html#comments</comments>
		<pubDate>Fri, 09 Dec 2016 01:05:47 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[直播我的个人基因组]]></category>
		<category><![CDATA[变异]]></category>
		<category><![CDATA[基因组]]></category>
		<category><![CDATA[直播]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2116</guid>
		<description><![CDATA[昨天我们说到，测序得到的fastq文件map到基因组之后，我们通常会得到一个sa &#8230; <a href="http://www.bio-info-trainee.com/2116.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>昨天我们说到，测序得到的fastq文件map到基因组之后，我们通常会得到一个sam或者bam为扩展名的文件。SAM的全称是sequence alignment/map format，而BAM就是SAM的二进制文件。通常sam文件太大，我们会生成bam文件来节省空间。sam文件和bam文件的转换用samtools这个软件就可以完成。<span id="more-2116"></span></p>
<pre class="">samtools view -h abc.bam &gt; abc.sam
samtools view -b -S abc.sam &gt; abc.bam</pre>
<section class="" data-source="bj.96weixin.com">
<section>
<section>
<section class="">
<section class="">
<section>
<section class=""></section>
</section>
</section>
<section class=""></section>
<section class=""></section>
</section>
</section>
</section>
</section>
<p>我们已经拿到了bam文件，我这里就先用公司给我的bam文件吧，根据我的帖子：<a><strong>仅仅对感兴趣的基因call variation</strong></a> ，可以先了解几个比较有趣的基因的变异情况。我自己呢，对以下几个位点和基因比较感兴趣，就用他们来讲一下今天的内容吧！</p>
<section class="" data-source="bj.96weixin.com"></section>
<section class="" data-source="bj.96weixin.com">
<section>
<section class="">
<section class="">
<section class="">
<section class="">
<section class="">
<section class="">
<section>
<blockquote><p><strong>1</strong><strong>.</strong>STAT4上的rs7574865和HLA-DQ的 rs9275319是中国人群中乙型肝炎病毒（HBV）相关肝细胞癌（HCC）遗传易感基因</p>
<p><strong>2.</strong>V1aR基因是雄性标志性出轨基因。</p>
<p><strong>3.</strong>GLI3和PAX1基因控制鼻孔的大小，而RUNX2基因控制鼻梁的宽度。DCHS2基因调控鼻子的突起程度，即决定鼻尖是否朝上和鼻尖的角度，或者说它决定了你的鼻子是否迷人挺拔。</p>
<p><strong>4.</strong>肥胖有关的基因FTO（Fat Mass and Obesity Associated），最近发现了调控肥胖（主要是脂肪燃烧）的基因是IRX3 和IRX5。大约100个基因位点与BMI(身体质量指数)相关，600个基因位点与身高相关，160个基因位点与肥胖特征如腰臀比相关。6个新基因位点，这些位点位于LEMD2、CD47、GANAB、RPS6KA5/C14orf159、ANP32和ARL15基因内或周围。</p></blockquote>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
<p>那，我们就先关注这几个基因吧（不要问我为什么(-_-メ)　）。</p>
<section class="" data-source="bj.96weixin.com">
<section>
<section>
<section class="">
<section class="">
<section>
<section class=""></section>
</section>
</section>
<section class=""></section>
<section class=""></section>
</section>
</section>
</section>
</section>
<p>首先找到这些基因的坐标，看到如下：</p>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzuDK0p23Mt9iaibtiboAS9stdsOz1yfFOwYYN4edABaXPnTTSlF3KpF0c5bEPhZibciccm0rFHsCIp7Sw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.5738724727838258" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzuDK0p23Mt9iaibtiboAS9stdsOz1yfFOwYYN4edABaXPnTTSlF3KpF0c5bEPhZibciccm0rFHsCIp7Sw/0?wx_fmt=png" data-type="png" data-w="643" data-fail="0" /></p>
<p>其中V1aR基因这个雄性标志性出轨基因，在标准的基因命名系统里面其实是AVPR1A：<a>http://www.genecards.org/cgi-bin/carddisp.pl?gene=AVPR1A</a> ，这里面涉及到HUGO symbol的概念，这个genecard数据库也非常赞，基因相关信息都可以在这里面查找的。</p>
<p>有了这些坐标信息，我们就进入我们的基因组工作目录：</p>
<p>cd data/project/myGenome/</p>
<p>然后把坐标文件做好</p>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzuDK0p23Mt9iaibtiboAS9stdR8nx5AAPcSnMO1Nca4PsbRAju4PFHqU4Q0BqYJHxflw2bKxJlAgc3A/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.7729468599033816" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzuDK0p23Mt9iaibtiboAS9stdR8nx5AAPcSnMO1Nca4PsbRAju4PFHqU4Q0BqYJHxflw2bKxJlAgc3A/0?wx_fmt=png" data-type="png" data-w="414" data-fail="0" /></p>
<p>因为公司给我的bam文件里面，用的参考基因组是GRCh37而不是hg19(两者区别在于chr是否标记)，我们还是需要下载；</p>
<section class="" data-source="bj.96weixin.com">
<section class="">
<section>
<blockquote><p>cd ~/reference</p>
<p>mkdir -p  genome/human_g1k_v37  &amp;&amp; cd genome/human_g1k_v37</p>
<p># http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/</p>
<p>nohup wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz  &amp;</p>
<p>gunzip human_g1k_v37.fasta.gz</p>
<p>wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.fai</p>
<p>wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/README.human_g1k_v37.fasta.txt</p></blockquote>
</section>
</section>
</section>
<p>然后回到基因组工作目录，保证bam文件在上图中bamFiles那个目录，然后用下面这个脚本，批量提取我们感兴趣的基因的变异情况：</p>
<section class="" data-source="bj.96weixin.com">
<section class="">
<section>
<blockquote><p>cat key_gene.list |while read id;</p>
<p>do</p>
<p>chr=$(echo $id |cut -d" " -f 1|sed 's/chr//' )</p>
<p>start=$(echo $id |cut -d" " -f 2 )</p>
<p>end=$(echo $id |cut -d" " -f 3 )</p>
<p>gene=$(echo $id |cut -d" " -f 4 )</p>
<p>echo $chr:$start-$end  $gene</p>
<p>samtools mpileup -r  $chr:$start-$end   -ugf ~/reference/genome/human_g1k_v37/human_g1k_v37.fasta bamFiles/P_jmzeng.final.bam  | \</p>
<p>bcftools call -vmO z -o $gene.vcf.gz</p>
<p>done</p></blockquote>
</section>
</section>
</section>
<p>等三分钟就好了，结果如下：</p>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzuDK0p23Mt9iaibtiboAS9stdyLt1xV5cOvAyzv4p6kAiaQQKiajuvSHZgB7wQbiamAMVP8eIsuZGXIAAQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.6282420749279539" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wzuDK0p23Mt9iaibtiboAS9stdyLt1xV5cOvAyzv4p6kAiaQQKiajuvSHZgB7wQbiamAMVP8eIsuZGXIAAQ/0?wx_fmt=png" data-type="png" data-w="347" data-fail="0" /></p>
<p>前面我们说到有研究表明STAT4上的rs7574865和HLA-DQ的 rs9275319是国人群中乙型肝炎病毒（HBV）相关肝细胞癌（HCC）遗传易感基因，那么我们很容易去dbSNP数据库或者我最近强烈推荐 的snpedia数据库（<a>吐血推荐snpedia数据库，非常丰富的snp信息记录</a>）里面找到它的坐标。</p>
<p>6 32666295 :Rs9275319--HLA-DQ</p>
<p>2 191964633 :Rs7574865--STAT4</p>
<p>然后我检查了我刚才call到的variation文件，</p>
<p>zcat STAT4.vcf.gz |grep -w 191964633 显示为空。</p>
<p>zcat HLA-DQ* |grep 32666295  也是空。</p>
<p>哈哈，我完美的错过了这两个易感位点！！！！谢天谢地！！！</p>
<p>其余的我就不讲了，毕竟会涉及到隐私，我就讲这个方法吧！</p>
<section class="" data-source="bj.96weixin.com">
<section class="" data-source="bj.96weixin.com">
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyjoUeibyaD3LkKgibVsgFkQQbykkQibQMIFo3vKKk1R3mibMpd60ibCxfDggIEAdkhVicQd6ic0NeVU9TwQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" width="160px" data-ratio="0.05278592375366569" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyjoUeibyaD3LkKgibVsgFkQQbykkQibQMIFo3vKKk1R3mibMpd60ibCxfDggIEAdkhVicQd6ic0NeVU9TwQ/640?wx_fmt=png" data-type="png" data-w="341" data-fail="0" /></section>
</section>
</section>
</section>
<p>文：Jimmy、吃瓜群众</p>
<p>图文编辑：吃瓜群众</p>
<p><img src="http://mmbiz.qpic.cn/mmbiz_jpg/cZNhZQ6j4wymhcSic2cJhIW7L17Lp7oL42xc4bOY4QnvWrTeMvFwpqLnqQl7etiaKkjysN0pdqVHhhXOAfLyzxdw/640?wx_fmt=jpeg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.9140625" data-s="300,640" data-src="http://mmbiz.qpic.cn/mmbiz_jpg/cZNhZQ6j4wymhcSic2cJhIW7L17Lp7oL42xc4bOY4QnvWrTeMvFwpqLnqQl7etiaKkjysN0pdqVHhhXOAfLyzxdw/640?wx_fmt=jpeg" data-type="jpeg" data-w="640" data-fail="0" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2116.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>【直播】我的基因组（七）:从整体理解全基因组测序数据的变异位点</title>
		<link>http://www.bio-info-trainee.com/2030.html</link>
		<comments>http://www.bio-info-trainee.com/2030.html#comments</comments>
		<pubDate>Wed, 23 Nov 2016 02:08:00 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[直播我的个人基因组]]></category>
		<category><![CDATA[变异]]></category>
		<category><![CDATA[基因组]]></category>
		<category><![CDATA[直播]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2030</guid>
		<description><![CDATA[首先记住一个很重要的知识点，变异是相对的！ 简单说一下什么是找变异，变异跟突变有 &#8230; <a href="http://www.bio-info-trainee.com/2030.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<section class="" data-source="bj.96weixin.com">
<section>
<section class="">
<section>
<section class="">
<section class="">
<section>
<section>
<section><strong>首先</strong>记住一个很重要的知识点，变异是相对的！</section>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
<p>简单说一下什么是找变异，变异跟突变有什么区别呢？举个栗子：有国际组织规定了人类的参考基因组（如UCSC,ENSEMBL,NCBI等，前面帖子都有讲)，就是 AAAAA(这里简化一下，就5个碱基，其实人类基因组多达30亿个)  。现在通过给自己测序得知，我与之对应的是AGCAA，那么我相比国际基因组来说，就是2个变异位点，位于基因组的坐标2和3，但是它们还不能说就是突变。<span id="more-2030"></span></p>
<p>如第二位碱基，虽然我的是G，参考基因组是A，但是全球已经测序了几百万人，而我查看了他们的测序结果，其中99万人都是G，这说明是参考基因组出现了问题，可能是国际组织当年恰好选择了一个人是A，所以就规定第二个碱基是A。所以虽然我用软件找到了我的这个位点相对于参考基因组是来说，是一个变异，但是这恰好是好事，完全不用担心，我们也不需要用突变这个单词来描述它！</p>
<p>那么接下来看第3位碱基，同样，国际组织规定了是A，而我却测了个C，但是全球已经公布的一百万人里面99.999万人都跟参考一样，就是A。有一个人和参考基因组对应的碱基不一样，不一样的那个人是个有病的患者，这个时候，你就惨了，这个变异，就是突变了！</p>
<p>很多变异其实只是造成人种多样性的原因，是构成人独特性的基础，而那些跟疾病相关的变异，我们通常就会叫做是突变！因我我只举了2个极端的例子，所以大家可能会误以为，跟大多数人一样，就没事了！其实也并不是这样，一般来说，在正常人的数据库里面出现了5%的变异就可以认为没什么大的危害，而且变异还可以分成germline、somatic、de novo等情况，如果是特定性的针对某种疾病还可以找driver的mutation，但总之，我们得先找到自己的测序数据跟国际规定的参考基因组有什么区别（变异）吧！</p>
<section class="" data-source="bj.96weixin.com"><img class="" src="http://mmbiz.qpic.cn/mmbiz/iaGswicCbWm6ic2rtZ8OSsoVQKbk32I4libuUicJbdN9ibSSoQEiafWiatLdtS7KIVeecYNDTdzG3ibCoYW3VVwhgHm8cBA/640?wx_fmt=jpeg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/iaGswicCbWm6ic2rtZ8OSsoVQKbk32I4libuUicJbdN9ibSSoQEiafWiatLdtS7KIVeecYNDTdzG3ibCoYW3VVwhgHm8cBA/0?wx_fmt=jpeg" data-ratio="0.042435424354243544" data-w="542" data-fail="0" /></section>
<p>变异分成4种，即snv、indel、cnv、sv，大部分情况下只能分析到SNV，另外3个要么不准确，要么有点难度！</p>
<p>bwa软件的作者，大名鼎鼎的 Heng Li给出的流程如下： <a target="_blank">http://www.htslib.org/workflow/</a></p>
<p>根据Heng Li的博客自己也完成过几十个外显子数据的找变异分析，其中还包括一个自闭症家系的分析，通过与参考基因组比较找到变异并不难，但是如何给找到的几万到几百万个变异一个合理的解释才是问题所在。</p>
<p>我当初的流程如下:(<a target="_blank">http://www.bio-info-trainee.com/1114.html</a>)</p>
<section class="" data-source="bj.96weixin.com">
<section>
<section class="">
<section>
<section class="">
<section class="">
<section>
<section>第一步，下载数据第二步，bwa比对</p>
<p>第三步，sam转为bam，并sort好</p>
<p>第四步，标记PCR重复，并去除</p>
<p>第五步，产生需要重排的坐标记录</p>
<p>第六步，根据重排记录文件把比对结果重新比对</p>
<p>第七步，把最终的bam文件转为mpileup文件</p>
<p>第八步，用bcftools 来call snp</p>
<p>第九步，用freebayes来call snp</p>
<p>第十步，用gatk来call snp</p>
<section>第十一步，用varscan来call snp</section>
</section>
</section>
</section>
</section>
<section class=""></section>
</section>
</section>
</section>
</section>
<p>本次处理全基因组数据我也准备走同样的流程，因为找到变异并不是重点，即使中间有什么不妥，我们也可以随时回过头来看看问题出在哪里！</p>
<p>其中需要安装的软件及参考基因组及注释文件在我之前的文章里都提到了。</p>
<section class="" data-source="bj.96weixin.com"><img class="" src="http://mmbiz.qpic.cn/mmbiz/iaGswicCbWm68C9f3f4F1vFsqdVibUnyJZCV1PxfbBqBicItInMQNoiaqO8dm9mwLvfO71thBzKktF5ib6YUoWE8knvg/640?wx_fmt=jpeg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/iaGswicCbWm68C9f3f4F1vFsqdVibUnyJZCV1PxfbBqBicItInMQNoiaqO8dm9mwLvfO71thBzKktF5ib6YUoWE8knvg/0?wx_fmt=jpeg" data-ratio="0.042435424354243544" data-w="542" data-fail="0" /></section>
<p>大家可以简单用下面的代码处理一下KPGP0001这个个体的全基因组测序数据，如下：</p>
<section class="" data-source="bj.96weixin.com">
<section>
<section class="">
<section class="">
<section class="">
<section class="">
<section class="">
<section>ls *gz |xargs ~/biosoft/fastqc/FastQC/fastqc -t 10for i in $(seq 1 6) ;do (nohup ~/biosoft/bwa/bwa-0.7.15/bwa  mem -t 5 -M ~/reference/index/bwa/hg19  KPGP-00001_L${i}_R1.fq.gz KPGP-00001_L${i}_R2.fq.gz 1&gt;KPGP-00001_L${i}.sam 2&gt;KPGP-00001_L${i}.bwa.align.log &amp;);done</p>
<p>for i in $(seq 1 6) ;do (nohup samtools sort -@ 5 -o KPGP-00001_L${i}.sorted.bam  KPGP-00001_L${i}.sam &amp;);done</p>
<p>for i in $(seq 1 6) ;do (nohup samtools index KPGP-00001_L${i}.sorted.bam &amp;);done</p>
<p>samtools merge KPGP-00001.merge.bam *.sorted.bam</p>
<p>samtools sort -@ 50 -O bam -o KPGP-00001.sorted.merge.bam  KPGP-00001.merge.bam</p>
<p>samtools index  KPGP-00001.sorted.merge.bam</p>
<p>for i in $(seq 1 6) ;do ( samtools flagstat KPGP-00001_L${i}.sorted.bam &gt;KPGP-00001_L${i}.flagstat.txt );done</p>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
</section>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wwk0dMwwaM5BBnlRicmUA328jW0C0GKzCGXSQWW7qhkX0Hicomtxlg0acrPjIOl64oSq7KQiaONqNDEQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wwk0dMwwaM5BBnlRicmUA328jW0C0GKzCGXSQWW7qhkX0Hicomtxlg0acrPjIOl64oSq7KQiaONqNDEQ/0?wx_fmt=png" data-type="png" data-ratio="1.3885350318471337" data-w="471" data-fail="0" /></p>
<p>有学者处理了<em>Korean Personal Genomes Project (KPGP)中的 35 Korean genomes</em>里面的WGS数据，文章中用了两套SNV calling流程来处理：<a target="_blank">http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-S11-S6</a> 流程如下，大家可以进行一下参考。</p>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wwk0dMwwaM5BBnlRicmUA328VU1icGkpoL2kY53KanZI79Z6SNRYGNYkEAWFOcSAaqbB1TDSRDz6YAA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.7866666666666666" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wwk0dMwwaM5BBnlRicmUA328VU1icGkpoL2kY53KanZI79Z6SNRYGNYkEAWFOcSAaqbB1TDSRDz6YAA/0?wx_fmt=png" data-type="png" data-w="1200" data-fail="0" /></p>
<section class="" data-source="bj.96weixin.com">
<section class="" data-source="bj.96weixin.com">
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyjoUeibyaD3LkKgibVsgFkQQbykkQibQMIFo3vKKk1R3mibMpd60ibCxfDggIEAdkhVicQd6ic0NeVU9TwQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.05278592375366569" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyjoUeibyaD3LkKgibVsgFkQQbykkQibQMIFo3vKKk1R3mibMpd60ibCxfDggIEAdkhVicQd6ic0NeVU9TwQ/0?wx_fmt=png" data-type="png" data-w="341" data-fail="0" /></section>
<section>请扫描以下二维码关注我们，获取直播系列的所有帖子！</section>
<section><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/12.png"><img class="alignnone size-full wp-image-1965" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/12.png" alt="1" width="634" height="589" /></a></section>
</section>
</section>
</section>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2030.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>根据dbSNP的ID来转换成HGVS突变表示形式</title>
		<link>http://www.bio-info-trainee.com/1520.html</link>
		<comments>http://www.bio-info-trainee.com/1520.html#comments</comments>
		<pubDate>Sun, 10 Apr 2016 01:11:47 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信基础]]></category>
		<category><![CDATA[dbsnp]]></category>
		<category><![CDATA[HGVS]]></category>
		<category><![CDATA[变异]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1520</guid>
		<description><![CDATA[dbSNP的ID直接在NCBI的dbSNP官网可以看到详细介绍，现在已经更新到1 &#8230; <a href="http://www.bio-info-trainee.com/1520.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>dbSNP的ID直接在NCBI的dbSNP官网可以看到详细介绍，现在已经更新到146版本了，一般人看到一个ID肯定什么信息都获取不到，毕竟这只是人家NCBI规定的一个ID而已。但是HGVS突变形式就有非常详细的信息了。</p>
<p>人类基因组变异协会（HGVS）官方组织规定了mutation该如何记录：<a href="http://www.hgvs.org/mutnomen/recs.html">http://www.hgvs.org/mutnomen/recs.html  推荐大家都仔细阅读！！！</a></p>
<p><span id="more-1520"></span></p>
<p>还有一个程序是根据染色体坐标来得到HGVS突变形式：<a href="https://github.com/counsyl/hgvs">https://github.com/counsyl/hgvs</a> 这个有点复杂，我们先不讲！</p>
<p>其实YouTube上面有视频教程(<a class="constant" href="http://browser.1000genomes.org/Help/Movie?id=284">BioMart: Variation IDs to HGNC Symbols</a>)，考虑到大部分都无法翻墙，我这里给出一个取巧的解决办法！</p>
<p>取巧的办法就是，根据RS ID号直接组合域名，一下三种方式均可！</p>
<p><a href="http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs1800234">http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs1800234 </a></p>
<p><a href="http://www.ncbi.nlm.nih.gov/snp/1800234">http://www.ncbi.nlm.nih.gov/snp/1800234</a></p>
<p><a href="http://browser.1000genomes.org/Homo_sapiens/Variation/Explore?v=rs1800234">http://browser.1000genomes.org/Homo_sapiens/Variation/Explore?v=rs1800234</a></p>
<p>&nbsp;</p>
<p>下面详细讲解三种方式的返回结果：</p>
<p>直接爬取dbSNP的返回数据，提取对应的：</p>
<p>比如：<a href="http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs1800234">http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs1800234 </a></p>
<p>很明显就能看到：</p>
<table id="HGVS Names" width="200" cellpadding="2">
<tbody>
<tr>
<th class="text10" colspan="1" align="center" bgcolor="#ccccff">HGVS Names</th>
</tr>
<tr>
<td>
<div class="jig-ncbiexpander" data-jigconfig="auto:false,minHeight:'160px'">
<div class="expanderWrapper ui-ncbiexpander">
<ul class="dd_list">
<li>NC_000022.10:g.46615880T&gt;C</li>
<li>NC_000022.11:g.46219983T&gt;C</li>
<li>NG_012204.1:g.74382T&gt;C</li>
<li>NM_001001928.2:c.680T&gt;C</li>
<li>NM_005036.4:c.680T&gt;C</li>
<li>NP_001001928.1:p.Val227Ala</li>
<li>NP_005027.2:p.Val227Ala</li>
<li>XM_005261653.1:c.680T&gt;C</li>
<li>XM_005261654.1:c.680T&gt;C</li>
<li>XM_005261655.1:c.680T&gt;C</li>
</ul>
</div>
</div>
</td>
</tr>
</tbody>
</table>
<p>你只需要根据你自己想搜索的ID号来组合一个url</p>
<p>http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs197278</p>
<p>等等~~~~~~~~~~</p>
<p>或者直接在NCBI的snp页面根据ID来搜索：</p>
<p><a href="http://www.ncbi.nlm.nih.gov/snp/1800234">http://www.ncbi.nlm.nih.gov/snp/1800234</a></p>
<div class="supp">
<pre class="snp_flanks">AACATGAACAAGGTCAAAGCCCGGG[A/C/T]CATCCTCTCAGGAAAGGCCAGTAAC
</pre>
<dl class="snpsum_dl_left_align">
<dt>Chromosome:</dt>
<dd>22:46219983</dd>
<dt>Gene:</dt>
<dd>PPARA (<a href="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?geneId=5465">GeneView</a>)</dd>
<dt>Functional Consequence:</dt>
<dd>intron variant,missense</dd>
<dt>Validated:</dt>
<dd>by 1000G,by cluster,by frequency</dd>
<dt><a href="http://www.ncbi.nlm.nih.gov/projects/SNP/docs/rs_attributes.html#gmaf">Global MAF:</a></dt>
<dd>C=0.0170/85</dd>
<dt>HGVS:</dt>
<dd><span class="snpsum_hgvs"><a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_000022.10&amp;search=NC_000022.10:g.46615880T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">NC_000022.10:g.46615880T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_000022.11&amp;search=NC_000022.11:g.46219983T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">NC_000022.11:g.46219983T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NG_012204.1&amp;search=NG_012204.1:g.74382T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">NG_012204.1:g.74382T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NM_001001928.2&amp;search=NM_001001928.2:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">NM_001001928.2:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NM_005036.4&amp;search=NM_005036.4:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">NM_005036.4:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NP_001001928.1&amp;search=NP_001001928.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">NP_001001928.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NP_005027.2&amp;search=NP_005027.2:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">NP_005027.2:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261653.1&amp;search=XM_005261653.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261653.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261654.1&amp;search=XM_005261654.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261654.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261655.1&amp;search=XM_005261655.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261655.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261655.2&amp;search=XM_005261655.2:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261655.2:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261656.1&amp;search=XM_005261656.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261656.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261656.2&amp;search=XM_005261656.2:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261656.2:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261657.1&amp;search=XM_005261657.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261657.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_005261658.1&amp;search=XM_005261658.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_005261658.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_006724269.2&amp;search=XM_006724269.2:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_006724269.2:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_006724270.2&amp;search=XM_006724270.2:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_006724270.2:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530239.1&amp;search=XM_011530239.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530239.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530240.1&amp;search=XM_011530240.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530240.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530241.1&amp;search=XM_011530241.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530241.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530242.1&amp;search=XM_011530242.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530242.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530243.1&amp;search=XM_011530243.1:c.680T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530243.1:c.680T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530244.1&amp;search=XM_011530244.1:c.278T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530244.1:c.278T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XM_011530245.1&amp;search=XM_011530245.1:c.278T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XM_011530245.1:c.278T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_005261710.1&amp;search=XP_005261710.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_005261710.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_005261711.1&amp;search=XP_005261711.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_005261711.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_005261712.1&amp;search=XP_005261712.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_005261712.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_005261713.1&amp;search=XP_005261713.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_005261713.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_005261714.1&amp;search=XP_005261714.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_005261714.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_005261715.1&amp;search=XP_005261715.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_005261715.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_006724332.1&amp;search=XP_006724332.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_006724332.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_006724333.1&amp;search=XP_006724333.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_006724333.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528541.1&amp;search=XP_011528541.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528541.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528542.1&amp;search=XP_011528542.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528542.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528543.1&amp;search=XP_011528543.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528543.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528544.1&amp;search=XP_011528544.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528544.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528545.1&amp;search=XP_011528545.1:p.Val227Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528545.1:p.Val227Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528546.1&amp;search=XP_011528546.1:p.Val93Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528546.1:p.Val93Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XP_011528547.1&amp;search=XP_011528547.1:p.Val93Ala&amp;v=1:100&amp;content=5" target="
                _blank
            ">XP_011528547.1:p.Val93Ala</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XR_244379.1&amp;search=XR_244379.1:n.735+1578T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XR_244379.1:n.735+1578T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XR_937869.1&amp;search=XR_937869.1:n.827+1578T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XR_937869.1:n.827+1578T&gt;C</a>, <a href="http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=XR_937870.1&amp;search=XR_937870.1:n.822+1582T%3EC&amp;v=1:100&amp;content=5" target="
                _blank
            ">XR_937870.1:n.822+1582T&gt;C</a></span></dd>
</dl>
</div>
<div class="aux">
<p class="links nohighlight"><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;DbFrom=snp&amp;Cmd=Link&amp;LinkName=snp_pubmed_cited&amp;LinkReadableName=Pubmed%20%28SNP%20Cited%29&amp;IdsFromResult=1800234"><span class="links nohighlight snpsum_text_icon">PubMed</span></a><a href="http://www.ncbi.nlm.nih.gov/projects/SNP/snp3D.cgi?rsnum=1800234">Protein3D</a></p>
</div>
<p>还有很多其它类似的数据库都提供类似的服务：</p>
<p><b>比如Ensembl提供的千人基因组计划的接口：</b></p>
<div>
<div><a href="http://browser.1000genomes.org/Homo_sapiens/Variation/Explore?v=rs1800234">http://browser.1000genomes.org/Homo_sapiens/Variation/Explore?v=rs1800234</a></div>
<div></div>
<div>
<div class="rhs">
<div class="twocol-cell">
<p>This variation has <strong>11</strong> HGVS names - click the plus to show</p>
<div class="HGVS_names">
<div class="toggleable">
<blockquote><p><a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Location/View?contigviewbottom=variation_feature_variation%20%200normal;db=core;source=dbSNP;v=rs1800234;vdb=variation;vf=1229526">22</a>:g.46615880T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/Population?db=core;source=dbSNP;t=ENST00000493286.1;v=rs1800234;vdb=variation;vf=1229526">ENST00000493286</a>.1:n.890T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/Population?db=core;source=dbSNP;t=ENST00000262735.5;v=rs1800234;vdb=variation;vf=1229526">ENST00000262735</a>.5:c.680T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/ProtVariations?db=core;source=dbSNP;t=ENSP00000262735.5;v=rs1800234;vdb=variation;vf=1229526">ENSP00000262735</a>.5:p.Val227Ala<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/Population?db=core;source=dbSNP;t=ENST00000396000.2;v=rs1800234;vdb=variation;vf=1229526">ENST00000396000</a>.2:c.680T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/ProtVariations?db=core;source=dbSNP;t=ENSP00000379322.2;v=rs1800234;vdb=variation;vf=1229526">ENSP00000379322</a>.2:p.Val227Ala<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/Population?db=core;source=dbSNP;t=ENST00000434345.2;v=rs1800234;vdb=variation;vf=1229526">ENST00000434345</a>.2:c.508+1582T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/Population?db=core;source=dbSNP;t=ENST00000407236.1;v=rs1800234;vdb=variation;vf=1229526">ENST00000407236</a>.1:c.680T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/ProtVariations?db=core;source=dbSNP;t=ENSP00000385523.1;v=rs1800234;vdb=variation;vf=1229526">ENSP00000385523</a>.1:p.Val227Ala<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/Population?db=core;source=dbSNP;t=ENST00000402126.1;v=rs1800234;vdb=variation;vf=1229526">ENST00000402126</a>.1:c.680T&gt;C<br />
<a class="constant" href="http://browser.1000genomes.org/Homo_sapiens/Transcript/ProtVariations?db=core;source=dbSNP;t=ENSP00000385246.1;v=rs1800234;vdb=variation;vf=1229526">ENSP00000385246</a>.1:p.Val227Ala</p></blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1520.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>华盛顿大学把所有的变异数据都用自己的方法注释了一遍，然后提供下载</title>
		<link>http://www.bio-info-trainee.com/1344.html</link>
		<comments>http://www.bio-info-trainee.com/1344.html#comments</comments>
		<pubDate>Thu, 14 Jan 2016 12:16:34 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[CADD]]></category>
		<category><![CDATA[变异]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1344</guid>
		<description><![CDATA[华盛顿大学把所有的变异数据都用自己的方法注释了一遍，然后提供下载： 文献是：Ki &#8230; <a href="http://www.bio-info-trainee.com/1344.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div><span style="font-family: Times New Roman;">华盛顿大学把所有的变异数据都用<b>自己的方法</b>注释了一遍，然后提供下载：</span></div>
<div>
<div><span style="font-family: Times New Roman;">文献是：Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. </span></div>
<p><span style="font-family: Times New Roman;"><i>A general framework for estimating the relative pathogenicity of human genetic variants.</i><br />
Nat Genet. 2014 Feb 2. doi: <a href="http://dx.doi.org/10.1038/ng.2892">10.1038/ng.2892</a>.<br />
PubMed PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/24487276">24487276</a>.<br />
</span></div>
<div><span style="font-family: Times New Roman;">文中的观点是：现在大多的变异数据注释方法都非常单一，通常是看看该位点是否保守，对蛋白功能的改变，在什么domain上面等等。</span></div>
<div><span style="font-family: Times New Roman;">但这样是远远不够的，所以他们提出了一个新的注释方法，用他们自己的CADD方法把现存的一些公共数据库的变异位点（约86亿的位点）都注释了一下，并对每个位点进行了打分。</span></div>
<div><span style="font-family: Times New Roman;">C scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes.<br />
</span></div>
<div><span style="font-family: Times New Roman;">总之，他们的方法是无与伦比的！</span></div>
<div><span style="font-family: Times New Roman;">所有他们已经注释好的数据下载地址是：<a href="http://cadd.gs.washington.edu/download" target="_blank">http://cadd.gs.washington.edu/download</a></span></div>
<div>这些数据在很多时候非常有用，尤其是想跟自己得到的突变数据做交叉验证，或者做一下统计分析的时候！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard5.png"><img class="alignnone size-full wp-image-1345" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard5.png" alt="clipboard" width="708" height="536" /></a></div>
<div>人的基因组才300亿个位点，他们就注释了86亿！！！</div>
<div>所以有三百多G的压缩包数据，我想，一般的公司或者单位都不会去用这个数据了！</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1344.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
