<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 变异位点</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e5%8f%98%e5%bc%82%e4%bd%8d%e7%82%b9/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>【直播】我的基因组（六）:变异位点注释数据库的准备</title>
		<link>http://www.bio-info-trainee.com/2028.html</link>
		<comments>http://www.bio-info-trainee.com/2028.html#comments</comments>
		<pubDate>Wed, 23 Nov 2016 02:05:27 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[直播我的个人基因组]]></category>
		<category><![CDATA[变异位点]]></category>
		<category><![CDATA[基因组]]></category>
		<category><![CDATA[直播]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2028</guid>
		<description><![CDATA[通常一个人的全基因组测序数据可以挖掘到四百万个SNVs(跟参考基因组不一样的单碱 &#8230; <a href="http://www.bio-info-trainee.com/2028.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>通常一个人的全基因组测序数据可以挖掘到四百万个SNVs(跟参考基因组不一样的单碱基位点)，还有五十万的indels(insertions or deletions),但是得到的数据通常是以vcf文件格式给出的(自行搜索什么是vcf格式)，比如下面：</p>
<p><img src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVdScAHcvnjDwx7HodUtCiaOxHzWOVmoz2qnwrXBQuo2mot9zYsAibdR3A/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-s="300,640" data-type="png" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVdScAHcvnjDwx7HodUtCiaOxHzWOVmoz2qnwrXBQuo2mot9zYsAibdR3A/0?wx_fmt=png" data-ratio="0.5197044334975369" data-w="812" data-fail="0" /></p>
<section class="" data-source="bj.96weixin.com">
<section>
<section class=""></section>
<section class="">
<section></section>
<section>很明显，正常人是看不懂这些变异位点有啥子一样的，只知道第20条染色体的1230237坐标上面本来是一个T碱基的，但是突变成了G，那么我们必然还想知道，这个位点是在某个基因上面吗？如果是，在基因的外显子还是内含子？它的突变有没有改变该基因的功能呢？有没有影响它的转录和翻译呢？还有世界上有没有其他正常人也是这个位点变异呢？如果有，是哪些人种呢？有没有癌症病人也发现了这个变异呢？如果有，是什么癌症呢？所以我们必须下载一系列的变异位点注释数据库，来全方位的解释我们自己找到那四百万个SNVs和五十万的indels。下面我们一起进行数据库准备。</p>
</section>
</section>
</section>
</section>
<p><span id="more-2028"></span></p>
<p>TCGA数据库是最大的癌症基因信息的数据库。TCGA中的somatic mutation大全非常重要，里面搜集的是TCGA计划里面各个癌症里面总结的somatic mutation，如果我们手头的样本的变异文件里面跟它有交集，那这就有些危险了。用下面的代码下载！</p>
<section class="" data-source="bj.96weixin.com">
<section>wget https://gdc-docs.nci.nih.gov/Data/Release_Notes/Manifests/GDC_open_MAFs_manifest.txt</p>
<p>for i in `cut -f 2  GDC_open_MAFs_manifest.txt`</p>
<p>do</p>
<p>echo $i</p>
<p>adress=`echo $i |cut -d'.' -f 4 `</p>
<p>filename=`echo $i |cut -f 2 |cut -d'.' -f 1-3,5-7 `</p>
<p>echo $adress $filename</p>
<p>wget -O "$filename" "https://gdc-api.nci.nih.gov/data/$adress"</p>
<p>done</p>
</section>
</section>
<p>其中，还有一些数据库是需要注册的，就没办法给出下载地址了，比如COSMIC，这个同样是关于癌症的数据库，我们也不希望正常人里面出现这些突变！附图给大家看看注册的界面。</p>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVYnA7uUzQu9Z3qE4yx2aVrGicdqs3xZ7RXtR7wVU8VReK02fiaiabrpd8g/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVYnA7uUzQu9Z3qE4yx2aVrGicdqs3xZ7RXtR7wVU8VReK02fiaiabrpd8g/0?wx_fmt=png" data-type="png" data-ratio="0.8552278820375335" data-w="746" data-fail="0" /></p>
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.027624309392265192" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/0?wx_fmt=png" data-type="png" data-w="724" data-fail="0" /></section>
</section>
<p>如果是正常人数据库，那么我们就需要把找到的正常人的变异位点在它里面出现的过滤掉，不研究了，因为正常人有这个变异也正常(当然也并不不绝对)，比如说千人基因组计划。</p>
<section class="" data-source="bj.96weixin.com">
<section class="" data-source="bj.96weixin.com">千人基因组计划里面涉及到了5个大人种，共25个小人种的基因型数据，把自己的基因型文件跟他们相比，可以得到某种程度的比较粗糙的祖缘分析结果，而且我们还下载了好几个国家级的基因组计划，都是针对特定人种的。</p>
<p>下载千人基因组数据库。</p>
<p>mkdir -p ~/annotation/variation/human/1000genomes</p>
<p>cd ~/annotation/variation/human/1000genomes</p>
<p>## ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/</p>
<p>nohup wget  -c -r -nd -np -k -L -p ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502 &amp;</p>
<p>还有一些其常常用数据库，我就不一一介绍了(#后是对应数据库的说明，大家可自行查看)</p>
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.027624309392265192" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/0?wx_fmt=png" data-type="png" data-w="724" data-fail="0" /></section>
</section>
<p>mkdir -p ~/annotation/variation/human/ExAC</p>
<p>cd ~/annotation/variation/human/ExAC</p>
<p>## http://exac.broadinstitute.org/</p>
<p>## ftp://ftp.broadinstitute.org/pub/ExAC_release/current</p>
<p>wget ftp://ftp.broadinstitute.org/pub/ExAC_release/current/ExAC.r0.3.1.sites.vep.vcf.gz.tbi</p>
<p>nohup wget ftp://ftp.broadinstitute.org/pub/ExAC_release/current/ExAC.r0.3.1.sites.vep.vcf.gz &amp;</p>
<p>wget ftp://ftp.broadinstitute.org/pub/ExAC_release/current/cnv/exac-final-cnv.gene.scores071316</p>
<p>wget <a>ftp://ftp.broadinstitute.org/pub/ExAC_release/current/cnv/exac-final.autosome-1pct-sq60-qc-prot-coding.cnv.bed</a></p>
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.027624309392265192" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/0?wx_fmt=png" data-type="png" data-w="724" data-fail="0" /></section>
</section>
<p>mkdir -p ~/annotation/variation/human/dbSNP<br />
cd ~/annotation/variation/human/dbSNP</p>
<p>## https://www.ncbi.nlm.nih.gov/projects/SNP/</p>
<p>## ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/</p>
<p>## ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/</p>
<p>nohup wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_20160601.vcf.gz &amp;</p>
<p>wget <a>ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_20160601.vcf.gz.tbi</a></p>
<section class="" data-source="bj.96weixin.com"><img class="" src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-ratio="0.027624309392265192" data-src="http://mmbiz.qpic.cn/mmbiz_png/cZNhZQ6j4wyLT9NWBvE2U5Iv9oicBYuJVc8CcBe53nW0WWibcVfxrrqTNEkpvWlo2VEEP5yvwt5iaafTalxIeROgQ/0?wx_fmt=png" data-type="png" data-w="724" data-fail="0" /></p>
<p>mkdir -p ~/annotation/variation/human/ESP6500</p>
<p>cd ~/annotation/variation/human/ESP6500</p>
<p># http://evs.gs.washington.edu/EVS/</p>
<p>nohup wget http://evs.gs.washington.edu/evs_bulk_data/ESP6500SI-V2-SSA137.GRCh38-liftover.snps_indels.vcf.tar.gz &amp;</p>
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/0?wx_fmt=png" data-ratio="0.027624309392265192" data-w="724" data-fail="0" /></section>
</section>
<p>mkdir -p ~/annotation/variation/human/UK10K</p>
<p>cd ~/annotation/variation/human/UK10K</p>
<p># http://www.uk10k.org/</p>
<p>nohup wget ftp://ngs.sanger.ac.uk/production/uk10k/UK10K_COHORT/REL-2012-06-02/UK10K_COHORT.20160215.sites.vcf.gz &amp;</p>
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/0?wx_fmt=png" data-ratio="0.027624309392265192" data-w="724" data-fail="0" /></section>
</section>
<p>mkdir -p ~/annotation/variation/human/gonl</p>
<p>cd ~/annotation/variation/human/gonl</p>
<p>## http://www.nlgenome.nl/search/</p>
<p>## https://molgenis26.target.rug.nl/downloads/gonl_public/variants/release5/</p>
<p>nohup wget  -c -r -nd -np -k -L -p https://molgenis26.target.rug.nl/downloads/gonl_public/variants/release5  &amp;</p>
<section class="" data-source="bj.96weixin.com"><img class="" src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/0?wx_fmt=png" data-ratio="0.027624309392265192" data-w="724" data-fail="0" />## 1 million single nucleotide polymorphisms (SNPs) for DNA samples from each of the three ethnic groups in Singapore – Chinese, Malays and Indians.</p>
<p>## The Affymetrix Genome-Wide Human SNP Array 6.0   &amp;&amp; The Illumina Human1M single BeadChip</p>
<p>## http://www.statgen.nus.edu.sg/~SGVP/</p>
<p>## http://www.statgen.nus.edu.sg/~SGVP/singhap/files-website/samples-information.txt</p>
<p># http://www.statgen.nus.edu.sg/~SGVP/singhap/files-website/genotypes/2009-01-30/QC/</p>
<p><img class="" src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/0?wx_fmt=png" data-ratio="0.027624309392265192" data-w="724" data-fail="0" />## Singapore Sequencing Malay Project (SSMP)</p>
<p>mkdir -p ~/annotation/variation/human/SSMP</p>
<p>cd ~/annotation/variation/human/SSMP</p>
<p>## http://www.statgen.nus.edu.sg/~SSMP/</p>
<p>## http://www.statgen.nus.edu.sg/~SSMP/download/vcf/2012_05</p>
</section>
<section class="" data-source="bj.96weixin.com">
<section><img class="" src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1" alt="" data-src="http://mmbiz.qpic.cn/mmbiz/wyice8kFQhf5geQK3gu2FUugjB8iaSGpjOwTCicEOIzAjhyFYzReiaBBVeO4ic3iawLdKUSAMYOdSn0Odibia2XM82KebQ/0?wx_fmt=png" data-ratio="0.027624309392265192" data-w="724" data-fail="0" /></section>
</section>
<p>## Singapore Sequencing Indian Project (SSIP)</p>
<p>mkdir -p ~/annotation/variation/human/SSIP</p>
<p>cd ~/annotation/variation/human/SSIP</p>
<p># http://www.statgen.nus.edu.sg/~SSIP/</p>
<p>## http://www.statgen.nus.edu.sg/~SSIP/download/vcf/dataFreeze_Feb2013</p>
<p>请扫描以下二维码关注我们，获取直播系列的所有帖子！</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/12.png"><img class="alignnone size-full wp-image-1965" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/12.png" alt="1" width="634" height="589" /></a></p>
</section>
</section>
</section>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2028.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
