<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; GEO</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/geo/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>生信人必学ftp站点之NCBI-GEO</title>
		<link>http://www.bio-info-trainee.com/1835.html</link>
		<comments>http://www.bio-info-trainee.com/1835.html#comments</comments>
		<pubDate>Tue, 02 Aug 2016 11:48:19 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[ftp]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[ncbi]]></category>
		<category><![CDATA[生信人]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1835</guid>
		<description><![CDATA[NCBI的重要性我就不多说了，Gene Expression Omnibus d &#8230; <a href="http://www.bio-info-trainee.com/1835.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>NCBI的重要性我就不多说了，<a href="http://www.ncbi.nlm.nih.gov/geo/">Gene Expression Omnibus database (GEO)</a>是由NCBI负责维护的一个数据库，设计初衷是为了收集整理各种表达芯片数据，但是后来也加入了甲基化芯片，lncRNA，miRNA，CNV芯片等各种芯片，甚至高通量测序数据！所有的数据均可以在ftp站点下载：<a href="ftp://ftp-trace.ncbi.nih.gov/geo/">ftp://ftp-trace.ncbi.nih.gov/geo/</a><span id="more-1835"></span></p>
<p>首先，我们在<a href="http://www.ncbi.nlm.nih.gov/geo/">GEO的主页</a>可以看到：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/GEO_stat.png"><img class="alignnone size-full wp-image-1836" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/GEO_stat.png" alt="GEO_stat" width="273" height="176" /></a></p>
<p>截止到2016年8月2日，统计信息如上，可以看到数据量很恐怖了。</p>
<h2><a href="http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/geo/">GEO数据库基础知识</a></h2>
<ul>
<li>GEO Platform (GPL) 芯片平台</li>
<li>GEO Sample (GSM) 样本ID号</li>
<li>GEO Series (GSE) study的ID号</li>
<li>GEO Dataset (GDS) 数据集的ID号</li>
</ul>
<p>这些数据都可以在ftp里面直接下载：</p>
<p>FTP directory /geo/ at ftp-trace.ncbi.nih.gov</p>
<pre>08/02/2016 05:39AM      Directory <a href="/geo/datasets/"><b>datasets</b></a>
08/02/2016 05:39AM      Directory <a href="/geo/platforms/"><b>platforms</b></a>
08/02/2016 05:39AM      Directory <a href="/geo/samples/"><b>samples</b></a>
08/02/2016 05:39AM      Directory <a href="/geo/series/"><b>series</b></a>
</pre>
<p>网址都是很有<strong><span style="color: #ff0000;">规律的！（请务必注意规律）</span></strong></p>
<div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75528">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75528</a></div>
</div>
<div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74311">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74311</a></div>
<div>我们一般是拿到了GSE的study ID号，然后直接把什么的url修改一下，就可以看到关于该study的所以描述信息，是用的什么测序平台(芯片数据，或者高通量测序)，测了多少个样本，来自于哪篇文章！</div>
<div>所有需要的数据均可以下载，而且都是在上面的ftp里面可以根据<strong><span style="color: #ff0000;">规律</span></strong>去找到的，甚至可以自己拼接下载的url链接，来做批量化处理！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/1.png"><img class="alignnone size-full wp-image-1838" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/1.png" alt="1" width="603" height="318" /></a></div>
<div>如果是芯片数据，那么就需要自己仔细看GPL平台里面关于每个探针对应的注释信息，才能利用好别人的数据。</div>
<div>如果是高通量测序数据，一般要同步进入该GSE对应的SRA里面去下载sra数据，然后转为fastq格式数据，自己做处理！</div>
<div></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1835.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>6种方式下载ENCODE计划的所有数据</title>
		<link>http://www.bio-info-trainee.com/1825.html</link>
		<comments>http://www.bio-info-trainee.com/1825.html#comments</comments>
		<pubDate>Thu, 28 Jul 2016 14:50:00 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[CHIP-seq]]></category>
		<category><![CDATA[broad]]></category>
		<category><![CDATA[encode]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[iHEC]]></category>
		<category><![CDATA[UCSC]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1825</guid>
		<description><![CDATA[DNA元件百科全书(Encyclopedia of DNA Elements,  &#8230; <a href="http://www.bio-info-trainee.com/1825.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)ENCODE计划的重要性我就不多说了，如果大家还不是很了解，可以直接跳到本文末尾去下载一下ENCODE教程，好好学习。该计划采用以下几种高通量测序技术来刻画了超过100种不同的细胞系或者组织内的全基因组范围内的基因调控元件信息。本来只是针对人类的，后来对mouse以及fly等模式生物也开始测这些数据并进行分析了， 叫做 modENCODE</div>
<blockquote>
<div>
<ul>
<li>chromatin structure (5C)</li>
<li>open chromatin (DNase-seq and FAIRE-seq)</li>
<li>histone modifications and DNA-binding of over 100 transcription factors (ChIP-seq)</li>
<li>RNA transcription (RNAseq and CAGE)</li>
</ul>
</div>
</blockquote>
<p><span id="more-1825"></span></p>
<div>目前所有数据均全部公开(<a href="http://genome.ucsc.edu/ENCODE/">http://genome.ucsc.edu/ENCODE/</a> )，<b><em>ENCODE results from 2007 and later are available from the ENCODE Project Portal</em></b>, <a href="https://encodeproject.org/" target="_blank">encodeproject.org</a>. 并以30篇论文在Nature、Science、Cell、JBC、Genome Biol、Genome Research同时发表(<a href="http://www.nature.com/encode">http://www.nature.com/encode</a> )。</div>
<div><b><span style="color: #ff0000;">所有数据从raw data形式的原始测序数据到比对后的信号文件以及分析好的有意的peaks文件都可以下载。</span></b></div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/b1b99613b37745ffa9d73779c4fcff19/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="DB8A373EA16A49E8B64FE5E5DB9D7D55" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/b1b99613b37745ffa9d73779c4fcff19/clipboard.png" /></div>
<div><span style="color: #555555; font-family: 'Microsoft YaHei';">我这里根据自己的学习情况，简单介绍一些</span>ENCODE计划数据下载方式，包括<b><span style="color: #ff0000;">ENCODE官网下载,UCSC下载，ENSEMBL下载，broad研究所数据，IHEC存放的数据，还有GEO下载这6种形式！！！</span></b></div>
<div></div>
<div><span style="color: #ff0000;"><b>首先在UCSC里面：</b></span></div>
<div>网址是：<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/</a>  因为是直接浏览文件，根据文件夹分类及文件名就可以任意方式下载自己感兴趣的数据啦，所以最对我胃口。</div>
<div>大家可能会比较习惯用UCSC提供的Genome Browser工具来可视化CHIP-seq的结果，而且Genome Browser里面非常多的选项可以控制各种在线资料是否跟你的数据一起显示来做对比，所以它必然有ftp服务器存放这些数据，其中比较出名的就是ENCODE计划的相关数据啦！如下图所示：</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8f0eeee6d500410ab157540652a61431/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="B7F66D3706C649F8BF7EF6936FB517F9" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8f0eeee6d500410ab157540652a61431/clipboard.png" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/1.png"><img class="alignnone size-full wp-image-1826" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/1.png" alt="1" width="471" height="393" /></a></div>
<div>我比较关注ENCODE计划的组蛋白数据，点击进入！</div>
<div>一般都是</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8ad548a844124647b8fbfcdaa757f91b/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="E8B918BE90E34EBF9982EB57981441F5" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8ad548a844124647b8fbfcdaa757f91b/clipboard.png" /></div>
<div>每个细胞系对应的各个组蛋白标记物的数据，从测序序列到比对bam文件，以及call到的peaks都可以下载！！！</div>
<div></div>
<p><b><span style="color: #ff0000;">然后是ENCODE计划的官网下载：</span></b></p>
<div>在ENCODE计划的官网上面还有各种数据处理的流程介绍：<a href="https://www.encodeproject.org/pipelines/">https://www.encodeproject.org/pipelines/</a></div>
<div>
<div>RNA-seq pipelines</div>
<div>RAMPAGE pipeline</div>
<div>Chromatin pipelines(Histone ChIP-seq Pipeline/Transcription Factor ChIP-seq Pipeline)</div>
<div>Methylation pipeline(WGBS Pipeline Overview)</div>
</div>
<div>官网的数据下载，做得像是一个购物网站，大家可以根据自己的需求把数据添加到购物篮，然后统一下载。</div>
<div>This document describes what <a href="https://www.encodeproject.org/help/getting-started/#info">data are available at the ENCODE Portal</a>, ways to <a href="https://www.encodeproject.org/help/getting-started/#use">get started searching and downloading data</a>, and an overview to how the <a href="https://www.encodeproject.org/help/getting-started/#organization">metadata describing the assays and reagents are organized</a>. ENCODE data can be visualized and accessed from <a href="https://www.encodeproject.org/about/data-access#other">other resources</a>, including the <a href="http://genome.ucsc.edu/ENCODE/">UCSC Genome Browser</a> and <a href="http://www.ensembl.org/info/website/tutorials/encode.html">ENSEMBL</a>.</div>
<div>进入 <a href="https://www.encodeproject.org/matrix/?type=Experiment">https://www.encodeproject.org/matrix/?type=Experiment</a> 可以看到里面列出了173种细胞系，148种组织，还有一堆癌症样本的，包括CHIP-seq，DNase-seq等在内的十几种高通量测序数据。</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/43a3bca60e994c5ea633c504b730590d/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="616FF6BC4456459CAE7946AECC7F8747" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/43a3bca60e994c5ea633c504b730590d/clipboard.png" /></div>
<div><img class="alignnone  wp-image-1827" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/2.png" alt="2" width="641" height="359" /></div>
<p><b><span style="color: #ff0000;">接下来是GEO数据库里面：</span></b></p>
<div>里面直接把所有跟ENCODE相关的GSE study列出来了：<a href="http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html">http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html</a></div>
<div>GEO数据就没什么好说的了，直接进入study页面，然后下载数据即可，这也是我比较喜欢的数据下载方式，因为GEO里面对一个实验的描述很详细。</div>
<p><b><span style="color: #ff0000;">然后是broad 研究所托管的ENCODE计划的数据:</span></b></p>
<div>大名鼎鼎的broad研究所貌似是生物信息最全面的资源站点了，它不仅host了ENCODE计划的所有数据，还有它分析ENCODE计划的数据时使用的软件，工具。</div>
<div>
<div><a href="http://www.broadinstitute.org/">http://www.broadinstitute.org/</a>~anshul/projects/encode</div>
</div>
<div>原始数据在：<a href="http://www.broadinstitute.org/">http://www.broadinstitute.org/</a>~anshul/projects/encode/rawdata/</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/9fabfa96d5744167ac15381581955ca3/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="E8F21AFA57154BDEA6B2D346005BE406" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/9fabfa96d5744167ac15381581955ca3/clipboard.png" /><img class="alignnone size-full wp-image-1828" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/3.png" alt="3" width="338" height="444" /></div>
<div></div>
<div>接着是 iHEC存放的数据：</div>
<div>
<div><a href="http://epigenomesportal.ca/ihec/download.html">http://epigenomesportal.ca/ihec/download.html</a></div>
</div>
<div>我还是第一次看到这个数据接口，也是以文件夹文件的形式直接浏览，根据自己的需求下载即可：</div>
<div>除了ENCODE计划的数据，还有Blueprint计划和roadmap计划的数据都可以下载。</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=1&amp;hubId=">CEEHRC</a></td>
<td>2014-09-18</td>
<td><a href="http://epigenomesportal.ca/edcc/CEEHRC_policies.html">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=3&amp;hubId=">Blueprint</a></td>
<td>2014-08-11</td>
<td><a href="http://www.blueprint-epigenome.eu/index.cfm?p=A73C2005-A2D3-0FDE-EF9BC386B65DF073">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=4&amp;hubId=">ENCODE</a></td>
<td>2011-01</td>
<td><a href="http://encodeproject.org/ENCODE/terms.html">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=5&amp;hubId=">NIH Roadmap</a></td>
<td>2014-05-29</td>
<td><a href="http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=6&amp;hubId=">DEEP</a></td>
<td>2014-08-15</td>
<td><a href="http://www.deutsches-epigenom-programm.de/data-access/">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=7&amp;hubId=">CREST JST</a></td>
<td>2014-09-12</td>
<td><a href="http://epigenome.cbrc.jp/ihec/crest-data-release-policy">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=8&amp;hubId=">KNIH</a></td>
<td>2015-07-15</td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;hubId=">Click here for policies</a></td>
</tr>
<tr>
<th colspan="5"></th>
</tr>
</tbody>
</table>
</div>
<p><b><span style="color: #ff0000;">最后就是ENSEMBL数据库里面的：</span></b></p>
<div>我没有找到直接下载地址；<a href="http://asia.ensembl.org/info/website/tutorials/encode.html">http://asia.ensembl.org/info/website/tutorials/encode.html</a></div>
<div>
<p>The full ENCODE datasets that were used in the Ensembl regulatory build can also be viewed in the Ensembl GrCh37 archive, by attaching a track hub to Region in Detail - the link below will do this automatically:</p>
<p><a href="http://grch37.ensembl.org/Homo_sapiens/Location/View?g=ENSG00000130544;contigviewbottom=url:http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt;format=TRACKHUB;menu=ENCODE%20data#modal_config_viewbottom-ENCODE_data_uniformHistonePeaks" target="_blank">Link to add ENCODE integrative analysis hub</a></p>
<p>This creates a menu in the Control Panel on Region in Detail, from which you can add individual tracks or groups of tracks using matrix selectors. Cell type and experimental factor are the two principal axes; other dimensions can be selected by clicking on a box to open an additional submenu (see below).</p>
</div>
<div>如果你对ENCODE计划不是很了解，可以先看看一些教程：</div>
<div>
<div>NIH提供的ENCODE计划相关教程： <a href="https://www.genome.gov/27553900/encode-tutorials/">https://www.genome.gov/27553900/encode-tutorials/</a></div>
<div></div>
<div><a href="https://www.genome.gov/27562350/encode-workshop-april-2015-keystone-symposia/">https://www.genome.gov/27562350/encode-workshop-april-2015-keystone-symposia/</a></div>
<div><a href="https://www.genome.gov/27561253/encode-workshop-tutorial-october-2014-ashg/">https://www.genome.gov/27561253/encode-workshop-tutorial-october-2014-ashg/</a></div>
<div><a href="https://www.genome.gov/27553901/encode-tutorial-may-2013-biology-of-genomes-cshl/">https://www.genome.gov/27553901/encode-tutorial-may-2013-biology-of-genomes-cshl/</a></div>
<div></div>
<div><a href="https://www.genome.gov/27563006/encoderoadmap-epigenomics-tutorial-october-2015-ashg/">https://www.genome.gov/27563006/encoderoadmap-epigenomics-tutorial-october-2015-ashg/</a></div>
<div><a href="https://www.genome.gov/27555330/encoderoadmap-epigenomics-tutorial-october-2013-ashg/">https://www.genome.gov/27555330/encoderoadmap-epigenomics-tutorial-october-2013-ashg/</a></div>
<div><a href="https://www.genome.gov/27551933/encoderoadmap-epigenomics-tutorial-nov-2012-ashg/">https://www.genome.gov/27551933/encoderoadmap-epigenomics-tutorial-nov-2012-ashg/</a></div>
<div></div>
<div><a href="http://useast.ensembl.org/info/website/tutorials/encode.html">http://useast.ensembl.org/info/website/tutorials/encode.html</a></div>
<div></div>
<div></div>
<div><a href="https://www.encodeproject.org/tutorials/">https://www.encodeproject.org/tutorials/</a></div>
<div><a href="https://www.encodeproject.org/tutorials/encode-meeting-2016/">https://www.encodeproject.org/tutorials/encode-meeting-2016/</a></div>
<div><a href="https://www.encodeproject.org/tutorials/encode-users-meeting-2015/">https://www.encodeproject.org/tutorials/encode-users-meeting-2015/</a></div>
</div>
<div></div>
<div>DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)项目旨在描述人类基因组中所编码的全部功能性序列元件。ENCODE计划于2003年9月正式启动，吸引了来自美国、英国、西班牙、日本和新加坡五国32个研究机构的440多名研究人员的参与，经过了9年的努力，<span style="color: #ff0000;"><b>研究了147个组织类型，进行了1478次实验，获得并分析了超过15万亿字节的原始数据，确定了400万个基因开关，</b></span>明确了哪些DNA片段能打开或关闭特定的基因，以及不同类型细胞之间的“开关”存在的差异。证明所谓“垃圾DNA”都是十分有用的基因成分，担任着基因调控重任。证明人体内没有一个DNA片段是无用的。</div>
<div></div>
<div></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1825.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>没有必要用R包GEOquery</title>
		<link>http://www.bio-info-trainee.com/1571.html</link>
		<comments>http://www.bio-info-trainee.com/1571.html#comments</comments>
		<pubDate>Thu, 14 Apr 2016 11:40:13 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[get]]></category>
		<category><![CDATA[url]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1571</guid>
		<description><![CDATA[以前我写过如何使用GEOquery和GEOmetadb, 它们的确很强大，也很好 &#8230; <a href="http://www.bio-info-trainee.com/1571.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>以前我写过如何使用GEOquery和GEOmetadb, 它们的确很强大，也很好用，做芯片数据pipeline的时候可以省很多力，但最近很多朋友都反应它联网有问题，经常无法下载数据！</p>
<ul>
<li><a title="详细阅读 使用GEOmetadb包来获取对应GEO数据的实验信息" href="http://www.bio-info-trainee.com/1085.html" rel="bookmark">使用GEOmetadb包来获取对应GEO数据的实验信息</a></li>
<li><a title="详细阅读 从GEO数据库下载矩阵数据-可以直接进行下游分析" href="http://www.bio-info-trainee.com/941.html" rel="bookmark">从GEO数据库下载矩阵数据-可以直接进行下游分析<br />
</a></li>
<li></li>
</ul>
<p>为了解决这个问题，我仔细又研究了一下GEO数据库，其实官网本身就提供了WEB API接口，直接根据需求定制化下载数据！</p>
<p>我们使用GEO数据，无非就是想根据study ID号(比如：GSE1009)得到它的raw CEL文件，或者表达矩阵，或者样本分组信息！！！</p>
<p>如果用R包GEOquery来完成这个目的，请参考我的<a href="https://github.com/bioconductor-china/software/blob/master/GEO_jmzeng.md">说明书</a>：</p>
<p>其实raw CEL文件，直接自己拼接url即可</p>
<blockquote><p>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1009/matrix/GSE1009_series_matrix.txt.gz</p>
<p>##表达矩阵，需要用在R里面read，skip掉注释信息，tab键分割</p>
<p>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1009/suppl/GSE1009_RAW.tar</p>
<p>##芯片原始数据，用affy包来读取</p>
<p>http://www.ncbi.nlm.nih.gov/geo/browse/?view=samples&amp;series=1009&amp;<span style="color: #ff0000;">mode=csv   </span></p>
<p><span style="color: #ff0000;">###样本分组信息</span></p></blockquote>
<p>根据任意study ID号，非常容易就可以拼接出这些url，完全hold住GEOquery这个包的所有功能！</p>
<p>如果该研究涉及到的样本较多，你还可以根据下面的文件列表来有选择性的抓取样本！</p>
<p>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1009/suppl/filelist.txt</p>
<p>你要明白的就是浏览器的get请求而已，把下面的字符串组合成一个完整的URL即可</p>
<blockquote>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/browse/">http://www.ncbi.nlm.nih.gov/geo/browse/</a>?</div>
<div>view=series&amp;   ## 四种，</div>
<div>zsort=date&amp;</div>
<div>mode=csv&amp;    ##很重要，可以直接下载csv文件</div>
<div>page=$i&amp;</div>
<div>display=5000    ##很重要</div>
<div></div>
<div>查看总数：curl --silent "<a href="http://www.ncbi.nlm.nih.gov/geo/browse/" target="_blank">http://www.ncbi.nlm.nih.gov/geo/browse/</a>" | grep "total_count"</div>
</blockquote>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1571.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用R语言包从EBI的arrayexpress数据库里面下载芯片数据</title>
		<link>http://www.bio-info-trainee.com/1432.html</link>
		<comments>http://www.bio-info-trainee.com/1432.html#comments</comments>
		<pubDate>Thu, 03 Mar 2016 14:13:26 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[arrayexpress]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[GEO]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1432</guid>
		<description><![CDATA[这个包跟GEOquery区别不是很大，只不过一个是正对NCBI的GEO数据库，一 &#8230; <a href="http://www.bio-info-trainee.com/1432.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>这个包跟GEOquery区别不是很大，只不过一个是正对NCBI的GEO数据库，一个是针对EBI的arrayexpress数据库，只有对写自动化脚本的人来说才有需求，一般个人分析者都是自己去数据库主页里面查找，然后拿到下载链接，一个个下载。</div>
<div>从EBI的arrayexpress数据库里面下载芯片数据：</div>
<div>主页：<a href="https://www.ebi.ac.uk/arrayexpress/">https://www.ebi.ac.uk/arrayexpress/</a></div>
<div>update to 2016-3-1 11:41:27</div>
<div>63890 experiments</div>
<div>1912744 assays</div>
<div>40.53 TB of archived data 数据量还是蛮大的</div>
<div>所有的data，都可以在ftp服务器里面下载：<a href="ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/experiment/BUGS/">ftp://ftp.ebi.ac.uk/pub/databases/arrayexpress/data/experiment/BUGS/</a></div>
<div>根据ID号很整齐的储存着。</div>
<div>也可以用一个R语言包：ArrayExpress R package</div>
<div>说明书；<a href="https://bioconductor.org/packages/release/bioc/vignettes/ArrayExpress/inst/doc/ArrayExpress.pdf">https://bioconductor.org/packages/release/bioc/vignettes/ArrayExpress/inst/doc/ArrayExpress.pdf</a></div>
<div>这个包来自于文献：<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2723004/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2723004/</a></div>
<div>2009年，那个时候R语言用的人很少，这个简单的包都可以发文章，现在看来简直不可思议！</div>
<div></div>
<div>其实大部分数据都是跟GEO数据库对应的：比如<a href="https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-55645/">https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-55645/</a><span class="Apple-converted-space"> </span> 对应于：GEO - GSE55645</div>
<div>比如对NASH表达数据查找：<a href="https://www.ebi.ac.uk/arrayexpress/search.html?query=NASH++expression">https://www.ebi.ac.uk/arrayexpress/search.html?query=NASH++expression</a><span class="Apple-converted-space"> </span> 30条结果里面只有4条是arrayexpress数据库独有的！</div>
<div>source("<a href="https://bioconductor.org/biocLite.R">https://bioconductor.org/biocLite.R</a>")</div>
<div>biocLite("ArrayExpress")</div>
<div>library(ArrayExpress)</div>
<div>网页搜索功能：<a href="https://www.ebi.ac.uk/arrayexpress/search.html?query=NASH++expression+Homo+sapiens" target="_blank">https://www.ebi.ac.uk/arrayexpress/search.html?query=NASH++expression+Homo+sapiens</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/1.png"><img class="alignnone size-full wp-image-1433" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/1.png" alt="1" width="761" height="593" /></a></div>
<div>如果用R语言，搜索如下：</div>
<div>可以用sets = queryAE(keywords = "NASH+expression", species = "homo+sapiens")</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/2.png"><img class="alignnone size-full wp-image-1434" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/2.png" alt="2" width="731" height="303" /></a></div>
<div>效果是一样的！</div>
<div>下载数据用：</div>
<div>back = getAE("E-MEXP-3291")</div>
<div>下载其实也就是里面存储了链接，直接调用R语言的下载函数即可！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/3.png"><img class="alignnone size-full wp-image-1435" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/3.png" alt="3" width="784" height="270" /></a></div>
<div>一般没必要下载原始测序文件，直接用下面这个函数就可以得到一个数据对象，可以直接得到表达矩阵和实验的metadata</div>
<p>rawset = ArrayExpress("E-MEXP-3291")</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1432.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>使用GEOmetadb包来获取对应GEO数据的实验信息</title>
		<link>http://www.bio-info-trainee.com/1085.html</link>
		<comments>http://www.bio-info-trainee.com/1085.html#comments</comments>
		<pubDate>Thu, 29 Oct 2015 02:30:27 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[GEOmetadb]]></category>
		<category><![CDATA[metadat]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1085</guid>
		<description><![CDATA[理论上我前面提到的GEOquery包就可以根据一个GSE索引号来获取NCBI提供 &#8230; <a href="http://www.bio-info-trainee.com/1085.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>理论上我前面提到的GEOquery包就可以根据一个GSE索引号来获取NCBI提供的所有关于这个GSE索引号的数据了，包括metadata，表达矩阵，soft文件，还有raw data</p>
<p>但是很多时候，那个metadata并不是很整齐，而且一个个下载太麻烦了，所以就需要用R的bioconductor的另一个神奇的包了GEOmetadb</p>
<p>它的示例：<a href="http://bioconductor.org/packages/devel/bioc/vignettes/GEOmetadb/inst/doc/GEOmetadb.R" target="_blank">http://bioconductor.org/<wbr />packages/devel/bioc/vignettes/<wbr />GEOmetadb/inst/doc/GEOmetadb.R</a></p>
<div>它的主页：<a href="http://bioconductor.org/packages/devel/bioc/html/GEOmetadb.html" target="_blank">http://bioconductor.<wbr />org/packages/devel/bioc/html/<wbr />GEOmetadb.html</a></div>
<div>里面还是很多数据库基础知识的</div>
<div>代码托管在github，它的示例代码是这样连接数据库的：</div>
<div>
<pre>library(GEOmetadb)
if(!file.exists('GEOmetadb.sqlite')) getSQLiteFile()
file.info('GEOmetadb.sqlite')
con &lt;- dbConnect(SQLite(),'GEOmetadb.sqlite')
dbDisconnect(con)
但是一般不会成功，因为这个包把它的GEOmetadb.sqlite文件放在了国外网盘共享，在国内很难访问，推荐大家想办法下载到本地
<a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp2.png"><img class="alignnone size-full wp-image-1086" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp2.png" alt="tmp2" width="641" height="213" /></a>
用这个代码就会成功了，需要自己下载GEOmetadb.sqlite文件然后放在指定目录：/path/GEOmetadb.sqlite 需要自己修改
我们的diabetes.GEO.list文件内容如下：
GSE1009
GSE10785
GSE1133
GSE11975
GSE121
GSE12409
那么会产生的表格文件如下：共有32列数据信息，算是蛮全面的了
<a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp1.png"><img class="alignnone  wp-image-1087" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp1.png" alt="tmp" width="752" height="501" /></a></pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1085.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
