<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 生信人</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e7%94%9f%e4%bf%a1%e4%ba%ba/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>生信人必学ftp站点之1000genomes</title>
		<link>http://www.bio-info-trainee.com/1841.html</link>
		<comments>http://www.bio-info-trainee.com/1841.html#comments</comments>
		<pubDate>Tue, 02 Aug 2016 12:10:07 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[1000genomes]]></category>
		<category><![CDATA[ftp]]></category>
		<category><![CDATA[variation]]></category>
		<category><![CDATA[生信人]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1841</guid>
		<description><![CDATA[千人基因组计划的重要性我也不想多说了，由于时间跨度比较长，最终的数据不只是一千人 &#8230; <a href="http://www.bio-info-trainee.com/1841.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>千人基因组计划的重要性我也不想多说了，由于<span style="font-family: NSimSum;"><span style="color: #333333;">时间跨度比较长，最终的数据不只是一千人，</span><b><span style="color: #ff0000;">最新版</span></b><b><span style="color: #ff0000;">共有NA编号开头的1182个人，HG开头的1768个人！</span></b>它的官方网站是：有一个ppt讲得很清楚如何通过官网做的data portal来下载数</span>据：<a href="https://www.genome.gov/pages/research/der/ichg-1000genomestutorial/how_to_access_the_data.pdf">https://www.genome.gov/pages/research/der/ichg-1000genomestutorial/how_to_access_the_data.pdf</a> 我不喜欢可视化的界面，我比较喜欢直接进入ftp自己翻需要的数据，千人基因组计划不仅仅有自己的ftp站点，而且在NCBI，EBI和sanger研究所里面也有数据源可以下载， 是非常丰富的生信入门资源！</p>
<div><a href="ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/">ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/</a></div>
<div><a href="ftp://ftp.sanger.ac.uk/pub/1000genomes/">ftp://ftp.sanger.ac.uk/pub/1000genomes/</a></div>
<div><a href="ftp://ftp.ebi.ac.uk/pub/databases/1000genomes/">ftp://ftp.ebi.ac.uk/pub/databases/1000genomes/</a></div>
<div> <a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/">ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp</a></div>
<p><span id="more-1841"></span></p>
<div></div>
<div><b><span style="color: #ff0000;">千人基因组计划测了5个大的人种，25个亚人种</span></b>，具体介绍如下：</div>
<div>
<pre>09/08/2014 12:00AM          1,663 <a href="file:///C:/1000genomes/ftp/20131219.populations.tsv">20131219.populations.tsv</a>
09/09/2014 12:00AM             97 <a href="file:///C:/1000genomes/ftp/20131219.superpopulations.tsv">20131219.superpopulations.tsv</a>
</pre>
<p>其实对大部分人来说，除非你想下载千人基因组计划的原始数据来学习生物信息学分析流程，不然用不着这个ftp站点的，它自己在EBI里面的有一个非常好用的可视化界面来浏览千人基因组计划的variation结果</p></div>
<div><span style="color: #ff0000;"><b>千人基因组计划 -- 基因组浏览器：</b></span> <a href="http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/">http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/</a></div>
<div>
<div><a href="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs35761398">http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs35761398</a>  chr1:24201919:24201920</div>
<div><a href="http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=2501432">http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=2501432</a>  chr1:24201920</div>
<div><a href="http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=2502992">http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=2502992</a>  chr1:24201919</div>
<div>在千人基因组计划里面看一个rs就能看到各种人群信息：</div>
<div><a href="http://browser.1000genomes.org/Homo_sapiens/Variation/Population?r=1:24201420-24202420;v=rs2501432;vdb=variation;vf=1849472">http://browser.1000genomes.org/Homo_sapiens/Variation/Population?r=1:24201420-24202420;v=rs2501432;vdb=variation;vf=1849472</a></div>
<div>这些人群信息，可以画一个网路图！ 只需要变化rs ID号即可，当然并不是所有的rs ID号都在千人基因组计划里面有显示的。</div>
<p>还有一个java软件-可视化检测千人基因组数据</p></div>
<div><a href="http://bioinformatics.oxfordjournals.org/content/early/2016/03/17/bioinformatics.btw147.short?rss=1">http://bioinformatics.oxfordjournals.org/content/early/2016/03/17/bioinformatics.btw147.short?rss=1</a></p>
<div><a href="http://limousophie35.github.io/Ferret/">http://limousophie35.github.io/Ferret/</a></div>
<p>但是好像不是很好用！</p></div>
<div></div>
<div><b><span style="color: #ff0000;">在千人基因组计划的ftp主站点里面可以下载所有数据。</span></b></div>
<div>
<div><a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/">ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/</a></div>
<div><a href="ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/">ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/</a></div>
<div>直接看最新版的数据，共有NA编号开头的1182个人，HG开头的1768个人！</div>
<div><a href="ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase3/data/">ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase3/data/</a></div>
<div>也可以按照人种来查看这些数据：<a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/">ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/</a></div>
<div>每个人的目录下面都有 四个数据文件夹</div>
<div>Oct 01 2014 00:00    Directory alignment</div>
<div>Oct 01 2014 00:00    Directory exome_alignment</div>
<div>Oct 01 2014 00:00    Directory high_coverage_alignment</div>
<div>Oct 01 2014 00:00    Directory sequence_read</div>
<div>这些数据实在是太丰富了！</div>
<div>也可以直接看最新版的vcf文件，记录了这两千多人的所有变异位点信息！</div>
<div>可以直接看到所有的位点，具体到每个人在该位点是否变异！</div>
<div><a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/">ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/</a></div>
<div></div>
<div>不过它的基因型信息是通过MVNcall+SHAPEIT这个程序call出来的，具体原理见：<a href="http://www.ncbi.nlm.nih.gov/pubmed/23093610">http://www.ncbi.nlm.nih.gov/pubmed/23093610</a></div>
<div>而且网站还提供一些教程：<a href="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/working/">ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/working/</a></div>
</div>
<div>
<div></div>
</div>
<div>我们肯定可以在千人基因计划的官网下载测序数据，主要是vcf格式的突变！</p>
<div>
<ul>
<li>Coriell Catalog website: <a href="https://catalog.coriell.org/1/NHGRI/Collections/1000-Genomes-Collections/1000-Genomes-Project" target="_blank">1000 Genomes Project</a></li>
<li>1000 Genomes website: <a href="http://browser.1000genomes.org/index.html" target="_blank">browser.1000genomes.org/index.html</a> (by SNP ID)</li>
<li>1000 Genomes website: <a href="http://www.1000genomes.org/data" target="_blank">www.1000genomes.org/data</a> (bulk data)</li>
</ul>
</div>
<div><b><span style="color: #ff0000;">但是关于它的表达数据，就不是那么简单了！</span></b></div>
<div>
<p>The most important available existing expression datasets involving 1000g individuals are probably the following:</p>
<p><strong>RNAseq (mRNA &amp; miRNA) on 465 individuals (CEU, TSI, GBR, FIN, YRI)</strong></p>
<p>Pre-publication RNA-sequencing data from the Geuvadis project is available through <a href="http://www.geuvadis.org/web/geuvadis/home" target="_blank">http://www.geuvadis.org</a></p>
<p><a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/samples.html">http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/samples.html</a><br />
<a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-2/samples.html">http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-2/samples.html</a></p>
<p><strong>RNAseq on 60 CEU individual</strong><b> </b><strong><sup>[1]</sup></strong></p>
<p><a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197" target="_blank">http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197</a></p>
<p><strong>Expression arrays on about 800 HapMap 3 individuals with a lot of overlap with 1000g data</strong><b> </b><strong><sup>[1,2]</sup></strong></p>
<p><a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-264" target="_blank">http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-198</a><br />
<a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-264" target="_blank">http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-264</a></p>
<p><strong>RNAseq for 69 YRI individuals</strong><b> </b><strong><sup>[3]</sup></strong></p>
<p><a href="http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-19480" target="_blank">http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-19480</a></p>
</div>
</div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1841.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>生信人必学ftp站点之NCBI-GEO</title>
		<link>http://www.bio-info-trainee.com/1835.html</link>
		<comments>http://www.bio-info-trainee.com/1835.html#comments</comments>
		<pubDate>Tue, 02 Aug 2016 11:48:19 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[ftp]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[ncbi]]></category>
		<category><![CDATA[生信人]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1835</guid>
		<description><![CDATA[NCBI的重要性我就不多说了，Gene Expression Omnibus d &#8230; <a href="http://www.bio-info-trainee.com/1835.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>NCBI的重要性我就不多说了，<a href="http://www.ncbi.nlm.nih.gov/geo/">Gene Expression Omnibus database (GEO)</a>是由NCBI负责维护的一个数据库，设计初衷是为了收集整理各种表达芯片数据，但是后来也加入了甲基化芯片，lncRNA，miRNA，CNV芯片等各种芯片，甚至高通量测序数据！所有的数据均可以在ftp站点下载：<a href="ftp://ftp-trace.ncbi.nih.gov/geo/">ftp://ftp-trace.ncbi.nih.gov/geo/</a><span id="more-1835"></span></p>
<p>首先，我们在<a href="http://www.ncbi.nlm.nih.gov/geo/">GEO的主页</a>可以看到：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/GEO_stat.png"><img class="alignnone size-full wp-image-1836" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/GEO_stat.png" alt="GEO_stat" width="273" height="176" /></a></p>
<p>截止到2016年8月2日，统计信息如上，可以看到数据量很恐怖了。</p>
<h2><a href="http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/geo/">GEO数据库基础知识</a></h2>
<ul>
<li>GEO Platform (GPL) 芯片平台</li>
<li>GEO Sample (GSM) 样本ID号</li>
<li>GEO Series (GSE) study的ID号</li>
<li>GEO Dataset (GDS) 数据集的ID号</li>
</ul>
<p>这些数据都可以在ftp里面直接下载：</p>
<p>FTP directory /geo/ at ftp-trace.ncbi.nih.gov</p>
<pre>08/02/2016 05:39AM      Directory <a href="/geo/datasets/"><b>datasets</b></a>
08/02/2016 05:39AM      Directory <a href="/geo/platforms/"><b>platforms</b></a>
08/02/2016 05:39AM      Directory <a href="/geo/samples/"><b>samples</b></a>
08/02/2016 05:39AM      Directory <a href="/geo/series/"><b>series</b></a>
</pre>
<p>网址都是很有<strong><span style="color: #ff0000;">规律的！（请务必注意规律）</span></strong></p>
<div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75528">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75528</a></div>
</div>
<div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74311">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74311</a></div>
<div>我们一般是拿到了GSE的study ID号，然后直接把什么的url修改一下，就可以看到关于该study的所以描述信息，是用的什么测序平台(芯片数据，或者高通量测序)，测了多少个样本，来自于哪篇文章！</div>
<div>所有需要的数据均可以下载，而且都是在上面的ftp里面可以根据<strong><span style="color: #ff0000;">规律</span></strong>去找到的，甚至可以自己拼接下载的url链接，来做批量化处理！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/1.png"><img class="alignnone size-full wp-image-1838" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/1.png" alt="1" width="603" height="318" /></a></div>
<div>如果是芯片数据，那么就需要自己仔细看GPL平台里面关于每个探针对应的注释信息，才能利用好别人的数据。</div>
<div>如果是高通量测序数据，一般要同步进入该GSE对应的SRA里面去下载sra数据，然后转为fastq格式数据，自己做处理！</div>
<div></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1835.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
