<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; UCSC</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/ucsc/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>CpG Islands记录文件下载的4种方式</title>
		<link>http://www.bio-info-trainee.com/2141.html</link>
		<comments>http://www.bio-info-trainee.com/2141.html#comments</comments>
		<pubDate>Thu, 15 Dec 2016 11:25:31 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[CpG Islands]]></category>
		<category><![CDATA[UCSC]]></category>
		<category><![CDATA[坐标记录]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2141</guid>
		<description><![CDATA[这个也是读者来信最多的，关于基因组某些区域的起始终止坐标的下载问题，genomi &#8230; <a href="http://www.bio-info-trainee.com/2141.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这个也是读者来信最多的，关于基因组某些区域的起始终止坐标的下载问题，genomic feature的问题，一般是gtf文件或者bed文件，比如人类hg19上面的所有外显子的坐标记录文件，所有基因的坐标记录文件，所有lncRNA，rRNA等等，我这里拿CpG Islands记录文件下载的4种方式举例子给大家说明一下：<span id="more-2141"></span></p>
<div>自己先理解几个概念：CpGI, CpG Shore, CpG shelf regions</div>
<div>最简单的首推UCSC的table browser(<a href="https://genome-euro.ucsc.edu/cgi-bin/hgTables">https://genome-euro.ucsc.edu/cgi-bin/hgTables</a>)，而且以BED格式文件格式输出(是普通的文本数据)</div>
<div>BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in an annotation track</div>
<div>下面是一个简单的实例，获取mm10的 CpG island 的坐标记录文件，根据你的需求，实时创建一个文件：</div>
<div>
<div>如果你足够聪明的话，应该明白，上面的选项任意组合，是可以现在各种记录文件的，包括基因的坐标，外显子的坐标，转录本的坐标，等等。</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/31.png"><img class="alignnone size-full wp-image-2142" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/31.png" alt="3" width="725" height="502" /></a></div>
<div>然后就是直接去ftp网站里面寻找文件下载， <a href="http://hgdownload.soe.ucsc.edu/downloads.html">http://hgdownload.soe.ucsc.edu/downloads.html</a>. Click on "Human" then "Annotation Database", and finally "cpgIslandExt.txt.gz" 其实就是修改url即可：</div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/">http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/</a></div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/">http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/</a></div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/">http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/</a></div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/</a></div>
<div>在里面搜索文件即可，可以看到，两种方法下载的数据是一样的，而且mouse已知的cpgIsland，要比人类少很多，应该是mouse的研究不够透彻</div>
<div></div>
<div>当然ensembl数据库的biomart界面也可以做同样的事情，<br />
最后，biomart还有一个biomaRt的R包也可以。</div>
<div>4种方法，就讲解完毕啦！</div>
<div>另外，强烈推荐R里面的genomic features相关的包，非常好学，学完了受益无穷！~~</div>
<div>本质上，就是理解TxDb和GenomicRanges对象而已。</div>
<blockquote>
<div>##　https://www.bioconductor.org/packages/devel/data/annotation/?TxDb<br />
?GenomicRanges</p>
<p>library(TxDb.Mmusculus.UCSC.mm10.knownGene)<br />
library(TxDb.Hsapiens.UCSC.hg19.knownGene)<br />
library(EnsDb.Hsapiens.v75)<br />
library(EnsDb.Mmusculus.v79)<br />
ls('package:EnsDb.Mmusculus.v79')</p>
<p>library(BSgenome.Hsapiens.UCSC.hg19.masked)<br />
library(BSgenome.Hsapiens.UCSC.hg19)</p>
<p>library(EnsDb.Hsapiens.v75)<br />
annoData &lt;- genes(EnsDb.Mmusculus.v79)<br />
annoData[1:2];length(annoData)<br />
ranges(annoData[1:2])</p>
<p>txdb &lt;- TxDb.Mmusculus.UCSC.mm10.knownGene<br />
txdb_dump &lt;- as.list(txdb)<br />
txdb_dump$genes</p>
</div>
<div></div>
</blockquote>
<div>subtract 2000bp and add 2000 to the CpG island region to get CpG shore regions</div>
<div></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2141.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>6种方式下载ENCODE计划的所有数据</title>
		<link>http://www.bio-info-trainee.com/1825.html</link>
		<comments>http://www.bio-info-trainee.com/1825.html#comments</comments>
		<pubDate>Thu, 28 Jul 2016 14:50:00 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[CHIP-seq]]></category>
		<category><![CDATA[broad]]></category>
		<category><![CDATA[encode]]></category>
		<category><![CDATA[GEO]]></category>
		<category><![CDATA[iHEC]]></category>
		<category><![CDATA[UCSC]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1825</guid>
		<description><![CDATA[DNA元件百科全书(Encyclopedia of DNA Elements,  &#8230; <a href="http://www.bio-info-trainee.com/1825.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)ENCODE计划的重要性我就不多说了，如果大家还不是很了解，可以直接跳到本文末尾去下载一下ENCODE教程，好好学习。该计划采用以下几种高通量测序技术来刻画了超过100种不同的细胞系或者组织内的全基因组范围内的基因调控元件信息。本来只是针对人类的，后来对mouse以及fly等模式生物也开始测这些数据并进行分析了， 叫做 modENCODE</div>
<blockquote>
<div>
<ul>
<li>chromatin structure (5C)</li>
<li>open chromatin (DNase-seq and FAIRE-seq)</li>
<li>histone modifications and DNA-binding of over 100 transcription factors (ChIP-seq)</li>
<li>RNA transcription (RNAseq and CAGE)</li>
</ul>
</div>
</blockquote>
<p><span id="more-1825"></span></p>
<div>目前所有数据均全部公开(<a href="http://genome.ucsc.edu/ENCODE/">http://genome.ucsc.edu/ENCODE/</a> )，<b><em>ENCODE results from 2007 and later are available from the ENCODE Project Portal</em></b>, <a href="https://encodeproject.org/" target="_blank">encodeproject.org</a>. 并以30篇论文在Nature、Science、Cell、JBC、Genome Biol、Genome Research同时发表(<a href="http://www.nature.com/encode">http://www.nature.com/encode</a> )。</div>
<div><b><span style="color: #ff0000;">所有数据从raw data形式的原始测序数据到比对后的信号文件以及分析好的有意的peaks文件都可以下载。</span></b></div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/b1b99613b37745ffa9d73779c4fcff19/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="DB8A373EA16A49E8B64FE5E5DB9D7D55" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/b1b99613b37745ffa9d73779c4fcff19/clipboard.png" /></div>
<div><span style="color: #555555; font-family: 'Microsoft YaHei';">我这里根据自己的学习情况，简单介绍一些</span>ENCODE计划数据下载方式，包括<b><span style="color: #ff0000;">ENCODE官网下载,UCSC下载，ENSEMBL下载，broad研究所数据，IHEC存放的数据，还有GEO下载这6种形式！！！</span></b></div>
<div></div>
<div><span style="color: #ff0000;"><b>首先在UCSC里面：</b></span></div>
<div>网址是：<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/</a>  因为是直接浏览文件，根据文件夹分类及文件名就可以任意方式下载自己感兴趣的数据啦，所以最对我胃口。</div>
<div>大家可能会比较习惯用UCSC提供的Genome Browser工具来可视化CHIP-seq的结果，而且Genome Browser里面非常多的选项可以控制各种在线资料是否跟你的数据一起显示来做对比，所以它必然有ftp服务器存放这些数据，其中比较出名的就是ENCODE计划的相关数据啦！如下图所示：</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8f0eeee6d500410ab157540652a61431/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="B7F66D3706C649F8BF7EF6936FB517F9" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8f0eeee6d500410ab157540652a61431/clipboard.png" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/1.png"><img class="alignnone size-full wp-image-1826" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/1.png" alt="1" width="471" height="393" /></a></div>
<div>我比较关注ENCODE计划的组蛋白数据，点击进入！</div>
<div>一般都是</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8ad548a844124647b8fbfcdaa757f91b/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="E8B918BE90E34EBF9982EB57981441F5" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/8ad548a844124647b8fbfcdaa757f91b/clipboard.png" /></div>
<div>每个细胞系对应的各个组蛋白标记物的数据，从测序序列到比对bam文件，以及call到的peaks都可以下载！！！</div>
<div></div>
<p><b><span style="color: #ff0000;">然后是ENCODE计划的官网下载：</span></b></p>
<div>在ENCODE计划的官网上面还有各种数据处理的流程介绍：<a href="https://www.encodeproject.org/pipelines/">https://www.encodeproject.org/pipelines/</a></div>
<div>
<div>RNA-seq pipelines</div>
<div>RAMPAGE pipeline</div>
<div>Chromatin pipelines(Histone ChIP-seq Pipeline/Transcription Factor ChIP-seq Pipeline)</div>
<div>Methylation pipeline(WGBS Pipeline Overview)</div>
</div>
<div>官网的数据下载，做得像是一个购物网站，大家可以根据自己的需求把数据添加到购物篮，然后统一下载。</div>
<div>This document describes what <a href="https://www.encodeproject.org/help/getting-started/#info">data are available at the ENCODE Portal</a>, ways to <a href="https://www.encodeproject.org/help/getting-started/#use">get started searching and downloading data</a>, and an overview to how the <a href="https://www.encodeproject.org/help/getting-started/#organization">metadata describing the assays and reagents are organized</a>. ENCODE data can be visualized and accessed from <a href="https://www.encodeproject.org/about/data-access#other">other resources</a>, including the <a href="http://genome.ucsc.edu/ENCODE/">UCSC Genome Browser</a> and <a href="http://www.ensembl.org/info/website/tutorials/encode.html">ENSEMBL</a>.</div>
<div>进入 <a href="https://www.encodeproject.org/matrix/?type=Experiment">https://www.encodeproject.org/matrix/?type=Experiment</a> 可以看到里面列出了173种细胞系，148种组织，还有一堆癌症样本的，包括CHIP-seq，DNase-seq等在内的十几种高通量测序数据。</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/43a3bca60e994c5ea633c504b730590d/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="616FF6BC4456459CAE7946AECC7F8747" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/43a3bca60e994c5ea633c504b730590d/clipboard.png" /></div>
<div><img class="alignnone  wp-image-1827" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/2.png" alt="2" width="641" height="359" /></div>
<p><b><span style="color: #ff0000;">接下来是GEO数据库里面：</span></b></p>
<div>里面直接把所有跟ENCODE相关的GSE study列出来了：<a href="http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html">http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html</a></div>
<div>GEO数据就没什么好说的了，直接进入study页面，然后下载数据即可，这也是我比较喜欢的数据下载方式，因为GEO里面对一个实验的描述很详细。</div>
<p><b><span style="color: #ff0000;">然后是broad 研究所托管的ENCODE计划的数据:</span></b></p>
<div>大名鼎鼎的broad研究所貌似是生物信息最全面的资源站点了，它不仅host了ENCODE计划的所有数据，还有它分析ENCODE计划的数据时使用的软件，工具。</div>
<div>
<div><a href="http://www.broadinstitute.org/">http://www.broadinstitute.org/</a>~anshul/projects/encode</div>
</div>
<div>原始数据在：<a href="http://www.broadinstitute.org/">http://www.broadinstitute.org/</a>~anshul/projects/encode/rawdata/</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/9fabfa96d5744167ac15381581955ca3/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="E8F21AFA57154BDEA6B2D346005BE406" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/9fabfa96d5744167ac15381581955ca3/clipboard.png" /><img class="alignnone size-full wp-image-1828" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/3.png" alt="3" width="338" height="444" /></div>
<div></div>
<div>接着是 iHEC存放的数据：</div>
<div>
<div><a href="http://epigenomesportal.ca/ihec/download.html">http://epigenomesportal.ca/ihec/download.html</a></div>
</div>
<div>我还是第一次看到这个数据接口，也是以文件夹文件的形式直接浏览，根据自己的需求下载即可：</div>
<div>除了ENCODE计划的数据，还有Blueprint计划和roadmap计划的数据都可以下载。</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=1&amp;hubId=">CEEHRC</a></td>
<td>2014-09-18</td>
<td><a href="http://epigenomesportal.ca/edcc/CEEHRC_policies.html">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=3&amp;hubId=">Blueprint</a></td>
<td>2014-08-11</td>
<td><a href="http://www.blueprint-epigenome.eu/index.cfm?p=A73C2005-A2D3-0FDE-EF9BC386B65DF073">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=4&amp;hubId=">ENCODE</a></td>
<td>2011-01</td>
<td><a href="http://encodeproject.org/ENCODE/terms.html">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=5&amp;hubId=">NIH Roadmap</a></td>
<td>2014-05-29</td>
<td><a href="http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=6&amp;hubId=">DEEP</a></td>
<td>2014-08-15</td>
<td><a href="http://www.deutsches-epigenom-programm.de/data-access/">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=7&amp;hubId=">CREST JST</a></td>
<td>2014-09-12</td>
<td><a href="http://epigenome.cbrc.jp/ihec/crest-data-release-policy">Click here for policies</a></td>
</tr>
<tr>
<td valign="top"><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" alt="[DIR]" data-media-type="image" data-attr-org-src-id="14EB6FC3E4414FA982589654930057BD" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/18c180f4679a4ca58f8786f1da4dba0d/folder.gif" /></td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;i=8&amp;hubId=">KNIH</a></td>
<td>2015-07-15</td>
<td><a href="http://epigenomesportal.ca/ihec/download.html?as=1&amp;hubId=">Click here for policies</a></td>
</tr>
<tr>
<th colspan="5"></th>
</tr>
</tbody>
</table>
</div>
<p><b><span style="color: #ff0000;">最后就是ENSEMBL数据库里面的：</span></b></p>
<div>我没有找到直接下载地址；<a href="http://asia.ensembl.org/info/website/tutorials/encode.html">http://asia.ensembl.org/info/website/tutorials/encode.html</a></div>
<div>
<p>The full ENCODE datasets that were used in the Ensembl regulatory build can also be viewed in the Ensembl GrCh37 archive, by attaching a track hub to Region in Detail - the link below will do this automatically:</p>
<p><a href="http://grch37.ensembl.org/Homo_sapiens/Location/View?g=ENSG00000130544;contigviewbottom=url:http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt;format=TRACKHUB;menu=ENCODE%20data#modal_config_viewbottom-ENCODE_data_uniformHistonePeaks" target="_blank">Link to add ENCODE integrative analysis hub</a></p>
<p>This creates a menu in the Control Panel on Region in Detail, from which you can add individual tracks or groups of tracks using matrix selectors. Cell type and experimental factor are the two principal axes; other dimensions can be selected by clicking on a box to open an additional submenu (see below).</p>
</div>
<div>如果你对ENCODE计划不是很了解，可以先看看一些教程：</div>
<div>
<div>NIH提供的ENCODE计划相关教程： <a href="https://www.genome.gov/27553900/encode-tutorials/">https://www.genome.gov/27553900/encode-tutorials/</a></div>
<div></div>
<div><a href="https://www.genome.gov/27562350/encode-workshop-april-2015-keystone-symposia/">https://www.genome.gov/27562350/encode-workshop-april-2015-keystone-symposia/</a></div>
<div><a href="https://www.genome.gov/27561253/encode-workshop-tutorial-october-2014-ashg/">https://www.genome.gov/27561253/encode-workshop-tutorial-october-2014-ashg/</a></div>
<div><a href="https://www.genome.gov/27553901/encode-tutorial-may-2013-biology-of-genomes-cshl/">https://www.genome.gov/27553901/encode-tutorial-may-2013-biology-of-genomes-cshl/</a></div>
<div></div>
<div><a href="https://www.genome.gov/27563006/encoderoadmap-epigenomics-tutorial-october-2015-ashg/">https://www.genome.gov/27563006/encoderoadmap-epigenomics-tutorial-october-2015-ashg/</a></div>
<div><a href="https://www.genome.gov/27555330/encoderoadmap-epigenomics-tutorial-october-2013-ashg/">https://www.genome.gov/27555330/encoderoadmap-epigenomics-tutorial-october-2013-ashg/</a></div>
<div><a href="https://www.genome.gov/27551933/encoderoadmap-epigenomics-tutorial-nov-2012-ashg/">https://www.genome.gov/27551933/encoderoadmap-epigenomics-tutorial-nov-2012-ashg/</a></div>
<div></div>
<div><a href="http://useast.ensembl.org/info/website/tutorials/encode.html">http://useast.ensembl.org/info/website/tutorials/encode.html</a></div>
<div></div>
<div></div>
<div><a href="https://www.encodeproject.org/tutorials/">https://www.encodeproject.org/tutorials/</a></div>
<div><a href="https://www.encodeproject.org/tutorials/encode-meeting-2016/">https://www.encodeproject.org/tutorials/encode-meeting-2016/</a></div>
<div><a href="https://www.encodeproject.org/tutorials/encode-users-meeting-2015/">https://www.encodeproject.org/tutorials/encode-users-meeting-2015/</a></div>
</div>
<div></div>
<div>DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)项目旨在描述人类基因组中所编码的全部功能性序列元件。ENCODE计划于2003年9月正式启动，吸引了来自美国、英国、西班牙、日本和新加坡五国32个研究机构的440多名研究人员的参与，经过了9年的努力，<span style="color: #ff0000;"><b>研究了147个组织类型，进行了1478次实验，获得并分析了超过15万亿字节的原始数据，确定了400万个基因开关，</b></span>明确了哪些DNA片段能打开或关闭特定的基因，以及不同类型细胞之间的“开关”存在的差异。证明所谓“垃圾DNA”都是十分有用的基因成分，担任着基因调控重任。证明人体内没有一个DNA片段是无用的。</div>
<div></div>
<div></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1825.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用UCSC提供的Genome Browser工具来可视化customTrack</title>
		<link>http://www.bio-info-trainee.com/1818.html</link>
		<comments>http://www.bio-info-trainee.com/1818.html#comments</comments>
		<pubDate>Tue, 26 Jul 2016 14:59:09 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[CHIP-seq]]></category>
		<category><![CDATA[customTrack]]></category>
		<category><![CDATA[Genome Browser]]></category>
		<category><![CDATA[UCSC]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1818</guid>
		<description><![CDATA[customTrack，我这里翻译为自定义的测序片段示踪文件，可以追踪我们的re &#8230; <a href="http://www.bio-info-trainee.com/1818.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>customTrack，我这里翻译为自定义的测序片段示踪文件，可以追踪我们的reads到底比对到了参加基因组的什么区域，或者追踪参考基因组的各个区域的覆盖度，测序深度！翻译自：<a href="http://genome.ucsc.edu/goldenPath/help/customTrack.html">http://genome.ucsc.edu/goldenPath/help/customTrack.html</a>  这个非常有用！！！</div>
<div>UCSC提供的Genome Browser工具非常好用，可以很方便的浏览我们的测序数据在参考基因组的比对情况，由于定义好了一系列track的文件格式，用户可以非常方便的上传自己的track文件，但是如果用户超过48小时没有浏览自己的数据，UCSC会默认删除掉这些数据，<span style="color: #ff0000;">除非用户已经保存在session里面。或者用户可以分享这些自定义的reads示踪文件customTrack。</span></div>
<p><span id="more-1818"></span></p>
<div>UCSC已经提供了一系列customTrack的例子：click the <a href="http://genome.ucsc.edu/goldenPath/customTracks/custTracks.html">Custom Tracks</a> link</div>
<div>这些自定义的Track文件保密性非常好，如果用户感兴趣，可以按照以下4个步骤来操作：</div>
<div><b>Step 1. Format the data set</b></div>
<div>我们支持非常多的Track文件格式，尤其是标准的GFF文件，还包括：<a href="http://genome.ucsc.edu/goldenPath/help/bedgraph.html">bedGraph</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format4">GTF</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format2">PSL</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format1">BED</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bigBed.html">bigBed</a>, <a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html">WIG</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bigGenePred.html">bigGenePred</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bigMaf.html">bigMaf</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bigChain.html">bigChain</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bigPsl.html">bigPsl</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bigWig.html">bigWig</a>, <a href="http://genome.ucsc.edu/goldenPath/help/bam.html">BAM</a>,<a href="http://genome.ucsc.edu/goldenPath/help/cram.html">CRAM</a>, <a href="http://genome.ucsc.edu/goldenPath/help/vcf.html">VCF</a>, <a href="http://genome.ucsc.edu/goldenPath/help/maf.html">MAF</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format1.7">BED detail</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format10">Personal Genome SNP</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format13">broadPeak</a>, <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format12">narrowPeak</a>, and <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format6.5">microarray</a> (BED15).</div>
<div>染色体一定是<em>chrN</em> 类型的标记，大小写敏感！也支持多种或者多个annotation的track文件。</div>
<div><b>Step 2. Define the Genome Browser display characteristics</b></div>
<div><b>设置浏览器选项，是否在Genome Browser里面显示UCSC的其它数据类型，包括hide/dense/pack/squish/full各种选项，包括ENCODE计划等各种公共数据是否需要显示。Add one or more optional <a href="http://genome.ucsc.edu/goldenPath/help/customTrack.html#lines">browser lines</a> to the beginning of your formatted data file to configure the overall display of the Genome Browser when it initially shows your annotation data. </b></div>
<div><b>这个非常复杂，但是一般就定义有限的几个属性即可。</b></div>
<div><b><b>Step 3. Define the annotation track display characteristics</b><br />
</b></div>
<div><b><b>设置如何显示自己的数据，包括颜色，数据名，数据描述情况。Following the browser lines--and immediately preceding the formatted data--add a <a href="http://genome.ucsc.edu/goldenPath/help/customTrack.html#TRACK">track line</a> to define the display attributes for your annotation data set. </b></b></div>
<div>下面这幅图里面的一些track的颜色，形状，注释，都是可以设置的，设置规则需要自己详细读说明书啦。</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/cf2f9141cbe34610be6fa295302d7b21/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="D4B30D1389BD42F88BA2767D75618CA7" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/cf2f9141cbe34610be6fa295302d7b21/clipboard.png" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/clipboard.png"><img class="alignnone size-full wp-image-1819" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/clipboard.png" alt="clipboard" width="662" height="138" /></a></div>
<div></div>
<div><b>Step 4. Display your annotation track in the Genome Browser</b></div>
<div><b>重点就是上传自己的文件，步骤是： </b></div>
<div><b>open the Genome Browser <a href="http://genome.ucsc.edu/index.html" target="_blank">home page</a> ,</b><b>click the Genome Browser link in the top menu bar. </b></div>
<div><b>On the <a href="http://genome.ucsc.edu/cgi-bin/hgGateway">Gateway page</a> that displays, select the genome and assembly on which your annotation data is based, then click the "add custom tracks" button.<br />
</b></div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/ad8b51739002455aa4d8f62d63162c92/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="0D658C12008D40B588AD8027D4D06BBE" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/ad8b51739002455aa4d8f62d63162c92/clipboard.png" />看到下面的图片的链接，点进去就好啦<a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/clipboard1.png"><img class="alignnone size-full wp-image-1820" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/clipboard1.png" alt="clipboard1" width="941" height="349" /></a></div>
<div><b>On the Add Custom Tracks page, load the annotation track data or URL for your custom track into the upper text box and the track documentation (optional) into the lower text box, <span style="color: #ff0000;">then click the Submit button. </span>Tracks may be loaded by entering text, a URL, or a pathname on your local computer.<br />
</b></div>
<div>用户可以提交多种格式的自定义track文件</div>
<div><b>see <a href="http://genome.ucsc.edu/goldenPath/help/customTrack.html#ADD_CT">Loading a Custom Track into the Genome Browser</a>.<br />
</b></div>
<div><b>提交完毕之后，直接回到 Genome Browser 页面就可以看到了，这个工具不默认跳转。</b></div>
<div><b> </b></div>
<div><b><b>Step 5. (Optional) Add details pages for individual track features</b><br />
</b></div>
<div><b><b><b>Step 6. (Optional) Share your annotation track with others</b><br />
</b></b></div>
<div><span style="font-family: Arial, Helvetica, sans-serif; font-size: medium;"><b>这是可选的步骤，自己去探索：read the section <a href="http://genome.ucsc.edu/goldenPath/help/customTrack.html#SHARE">Sharing Your Annotation Track with Others</a>.</b></span></div>
<div></div>
<div><b><b>我这里添加了一个UCSC也提供的一个wig文件：<a href="http://genome.ucsc.edu/goldenPath/help/examples/wiggleExample.txt">http://genome.ucsc.edu/goldenPath/help/examples/wiggleExample.txt</a> 作为测试例子，</b></b><b><b>显示如下：<a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/clipboard2.png"><img class="alignnone size-full wp-image-1821" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/07/clipboard2.png" alt="clipboard" width="813" height="261" /></a></b></b></div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/e5dcd88eefd94963aa5ceaa796a557fb/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="20E40359C864494A8F798444D7A905E2" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/e5dcd88eefd94963aa5ceaa796a557fb/clipboard.png" /></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1818.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>wig、bigWig和bedgraph文件详解</title>
		<link>http://www.bio-info-trainee.com/1815.html</link>
		<comments>http://www.bio-info-trainee.com/1815.html#comments</comments>
		<pubDate>Tue, 26 Jul 2016 14:53:16 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[CHIP-seq]]></category>
		<category><![CDATA[bedgraph]]></category>
		<category><![CDATA[bigWig]]></category>
		<category><![CDATA[UCSC]]></category>
		<category><![CDATA[wig]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1815</guid>
		<description><![CDATA[我们一般会熟悉sam/bam格式文件，就是把测序reads比对到参考基因组后的文 &#8230; <a href="http://www.bio-info-trainee.com/1815.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>我们一般会熟悉sam/bam格式文件，就是把测序reads比对到参考基因组后的文件！bam或者bed格式的文件主要是为了追踪我们的reads到底比对到了参加基因组的什么区域，而UCSC规定的这几个文件格式(wig、bigWig和bedgraph)用处不一样，仅仅是为了追踪参考基因组的各个区域的覆盖度，测序深度！而且这些定义好的文件，可以无缝连接到UCSC的Genome Browser工具里面进行可视化！</div>
<div>这个网站提供了这几种数据格式的构造及转换脚本：<a href="http://barcwiki.wi.mit.edu/wiki/SOPs/coordinates">http://barcwiki.wi.mit.edu/wiki/SOPs/coordinates</a></div>
<div>对SE数据，可以用macs2 pileup --extsize 200 -i $sample.bam -o $sample.bdg 把bam文件转换为bedgraph文件，不需要call peaks这一步骤。</div>
<div>而UCSC的ftp里面可以下载bedGraphToBigWig $sample.bdg ~/reference/genome/mm10/mm10.chrom.sizes $sample.bw 把bedgraph文件转换为bw文件，其余的转换工具都可以下载。</div>
<p><span id="more-1815"></span></p>
<div>具体文件格式定义请直接看UCSC的官网，下面是我基于自己的理解来翻译的，没什么特殊的，建议大家看原文，然后自己翻译一个，跟我比较！</div>
<div><b>Wiggle Track Format (WIG)：</b><a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html">http://genome.ucsc.edu/goldenPath/help/wiggle.html</a></div>
<div><b>bigWig Track Format ：</b><a href="http://genome.ucsc.edu/goldenPath/help/bigWig.html">http://genome.ucsc.edu/goldenPath/help/bigWig.html</a></div>
<div><b>BedGraph Track Format ：</b><a href="http://genome.ucsc.edu/goldenPath/help/bedgraph.html">http://genome.ucsc.edu/goldenPath/help/bedgraph.html</a></div>
<div>这3种文件格式都是UCSC规定的，所以它提供了系列工具进行互相转换，可以直接下载可执行版本程序：<a href="http://hgdownload.cse.ucsc.edu/admin/exe/">http://hgdownload.cse.ucsc.edu/admin/exe/</a></div>
<div>常见的工具如下：</div>
<div>
<ul>
<li><code>bigWigToBedGraph</code> — this program converts a bigWig file to ASCII <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format1.8">bedGraph</a> format.</li>
<li><code>bigWigToWig</code> — this program converts a bigWig file to <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format6">wig</a> format.</li>
<li><code>bigWigSummary</code> — this program extracts summary information from a bigWig file.</li>
<li><code>bigWigAverageOverBed</code> — this program computes the average score of a bigWig over each bed, which may have introns.</li>
<li><code>bigWigInfo</code> — this program prints out information about a bigWig file.</li>
</ul>
</div>
<div>其实对我们的bam文件，用samtools软件也可以很容易得到基因组区域的覆盖度和测序深度，比如：</div>
<div>
<div>samtools depth -r <span style="color: #ff0000;"><b>chr12:126073855-126073965</b></span>  Ip.sorted.bam</div>
<blockquote>
<div>chr12    126073855    5</div>
<div>chr12    126073856    15</div>
<div>chr12    126073857    31</div>
<div>chr12    126073858    40</div>
<div>chr12    126073859    44</div>
<div>chr12    126073860    52</div>
<div>~~~~~~~~~其余省略输出~~~~~~~~~</div>
</blockquote>
</div>
<div>这其实就是wig文件的雏形，但是wig文件会更复杂一点！</div>
<div>首先它不需要第一列了，因为全部是重复字段，只需要在每个染色体的第一行定义好染色体即可。</div>
<div>首先需要设置这个wig文件在UCSC的Genome Browser工具里面显示的属性：</div>
<div>
<pre><b>track type</b>=wiggle_0 <b>name</b>=<i>track_label</i><b>description=</b><i>center_label</i><b>visibility=</b><i>display_mode</i><b>color=</b><i>r,g,b</i><b>altColor=</b><i>r,g,b</i><b>priority=</b><i>priority</i><b>autoScale=</b><i>on|off</i><b>alwaysZero=</b><i>on|off</i><b>gridDefault=</b><i>on|off</i><b>maxHeightPixels=</b><i>max:default:min</i><b>graphType=</b><i>bar|points</i><b>viewLimits=</b><i>lower:upper</i><b>yLineMark=</b><i>real-value</i><b>yLineOnOff=</b><i>on|off</i><b>windowingFunction=</b><i>mean+whiskers|maximum|mean|minimum</i><b>smoothingWindow=</b>off|2-16</pre>
<pre><b>type</b>=wiggle_0 这个是默认的， 而且到目前为止，必须是这样的！其余的都是可选参数，自己读官网说明</pre>
<p>这些参数一般不用管，除非你很熟悉了UCSC的Genome Browser工具</p>
</div>
<div>然后需要设置每条染色体的属性，几个比较重要的参数是：</p>
<pre><b>fixedStep</b><b>chrom=</b><i>chrN</i><b>start=</b><i>position</i><b>step=</b><i>stepInterval</i><b>[span=</b><i>windowSize</i><b>]</b></pre>
</div>
<div>
<div><b>下面是wig的一个具体例子：</b></div>
<blockquote>
<div><b>track type=print wiggle_0 name=hek  description=hek</b></div>
<div><b>variableStep chrom=chr1 span=10</b></div>
<div>10008    7</div>
<div>10018    14</div>
<div>10028    27</div>
<div>10038    37</div>
<div>10048    45</div>
<div>10058    43</div>
<div>10068    37</div>
<div>10078    26</div>
<div>~~~~~~~~~其余省略输出~~~~~~~~~</div>
</blockquote>
</div>
<div>UCSC也提供了一个wig文件：<a href="http://genome.ucsc.edu/goldenPath/help/examples/wiggleExample.txt">http://genome.ucsc.edu/goldenPath/help/examples/wiggleExample.txt</a></div>
<div>可以看到我设置的参数很少很少，而且我是直接对sort后的bam文件用脚本变成wig文件的。</div>
<div>那么bigwig格式文件就没什么好讲的了，它就是wig格式文件的二进制压缩版本，这样更加节省空间。</div>
<div>我们只需要用UCSC提供的工具把自己的wig文件转换一下即可，步骤如下：</div>
<blockquote>
<div>
<ul>
<li>Save this <a href="http://genome.ucsc.edu/goldenPath/help/examples/wigVarStepExample.gz">wiggle file</a> to your machine (this satisfies <em>steps 1</em> and <em>2</em> above).</li>
<li>Save this <a href="http://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes">text file</a> to your machine. It contains the chrom.sizes for the human (hg19) assembly (this satisfies <em>step 4</em> above).</li>
<li>Download the <code>wigToBigWig</code> utility (see <em>step 3</em>).</li>
<li>Run the utility to create the bigWig output file (see <em>step 5</em>):<br />
<code><b>wigToBigWig</b> wigVarStepExample.gz hg19.chrom.sizes myBigWig.bw</code></li>
</ul>
</div>
</blockquote>
<div></div>
<div>最后我们讲一下BedGraph格式文件，它是BED文件的扩展，是4列的BED格式，但是需要添加UCSC的Genome Browser工具里面显示的属性，但是一般就定义有限的几个属性即可。</div>
<div>
<blockquote>
<pre><b>track type=bedGraph name=</b><i>track_label</i><b>description=</b><i>center_label</i><b>        visibility=</b><i>display_mode</i><b>color=</b><i>r,g,b</i><b>altColor=</b><i>r,g,b</i><b>        priority=</b><i>priority</i><b>autoScale=</b><i>on|off</i><b>alwaysZero=</b><i>on|off</i><b>        gridDefault=</b><i>on|off</i><b>maxHeightPixels=</b><i>max:default:min</i><b>        graphType=</b><i>bar|points</i><b>viewLimits=</b><i>lower:upper</i><b>        yLineMark=</b><i>real-value</i><b>yLineOnOff=</b><i>on|off</i><b>        windowingFunction=</b><i>maximum|mean|minimum</i><b>smoothingWindow=</b>off|2-16</pre>
</blockquote>
<p>有一点需要注意：<b> </b>These coordinates are <a href="http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1">zero-based, half-open</a>.</p>
</div>
<div> Chromosome positions are specified as 0-relative. The first chromosome position is 0. The last position in a chromosome of length N would be N - 1. Only positions specified have data.</div>
<div> Positions not specified do not have data and will not be graphed.</div>
<div>All positions specified in the input data must be in numerical order.</div>
<div>我这里有一个MACS对CHIP-seq数据call peaks附带的BedGraph文件，也可以用工具直接从bam格式文件得到：</div>
<div>
<blockquote>
<div><span style="color: #ff0000;"><b>track type=bedGraph name="hek_treat_all" description="Extended tag pileup from MACS version 1.4.2 20120305"</b></span></div>
<div>chr1    9997    9999    1</div>
<div>chr1    9999    10000   2</div>
<div>chr1    10000   10001   4</div>
<div>chr1    10001   10003   5</div>
<div>chr1    10003   10007   6</div>
<div>chr1    10007   10010   7</div>
<div>chr1    10010   10012   8</div>
<div>chr1    10012   10015   9</div>
<div>chr1    10015   10016   10</div>
<div>chr1    10016   10017   11</div>
<div>chr1    10017   10018   12</div>
<div></div>
</blockquote>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1815.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>基因组各种版本对应关系</title>
		<link>http://www.bio-info-trainee.com/1469.html</link>
		<comments>http://www.bio-info-trainee.com/1469.html#comments</comments>
		<pubDate>Tue, 15 Mar 2016 11:50:00 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[未分类]]></category>
		<category><![CDATA[ENSEMBL]]></category>
		<category><![CDATA[ncbi]]></category>
		<category><![CDATA[UCSC]]></category>
		<category><![CDATA[基因组版本]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1469</guid>
		<description><![CDATA[我是受到了SOAPfuse的启发才想到整理各种基因组版本的对应关系，完整版！！！ &#8230; <a href="http://www.bio-info-trainee.com/1469.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<pre>我是受到了SOAPfuse的启发才想到整理各种基因组版本的对应关系，完整版！！！</pre>
<pre>以后再也不用担心各种基因组版本混乱了，我还特意把所有的下载链接都找到了，可以下载任意版本基因组的基因fasta文件，gtf注释文件等等！！！</pre>
<div>首先是NCBI对应UCSC，对应ENSEMBL数据库：</div>
<div></div>
<div>
<blockquote>
<div>GRCh36 (hg18): ENSEMBL release_52.</div>
<div>GRCh37 (hg19): ENSEMBL release_59/61/64/68/69/75.</div>
<div>GRCh38 (hg38): ENSEMBL  release_76/77/78/80/81/82.</div>
</blockquote>
<div></div>
<div>可以看到ENSEMBL的版本特别复杂！！！很容易搞混！</div>
<div>但是UCSC的版本就简单了，就hg18,19,38, 常用的是hg19，但是我推荐大家都转为hg38</div>
<div>看起来NCBI也是很简单，就GRCh36,37,38，但是里面水也很深！</div>
<div>
<blockquote>
<pre>Feb 13 2014 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/April_14_2003/">April_14_2003</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.33/">BUILD.33</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.34.1/">BUILD.34.1</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.34.2/">BUILD.34.2</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.34.3/">BUILD.34.3</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.35.1/">BUILD.35.1</a>
Aug 03 2009 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.36.1/">BUILD.36.1</a>
Aug 03 2009 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.36.2/">BUILD.36.2</a>
Sep 04 2012 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.36.3/">BUILD.36.3</a>
Jun 30 2011 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.1/">BUILD.37.1</a>
Sep 07 2011 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.2/">BUILD.37.2</a>
Dec 12 2012 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.3/">BUILD.37.3</a></pre>
</blockquote>
</div>
<div>可以看到，有37.1,   37.2，  37.3 等等，不过这种版本一般指的是注释在更新，基因组序列一般不会更新！！！</div>
<div>反正你记住hg19基因组大小是3G，压缩后八九百兆即可！！！</div>
<div></div>
<div>如果要下载GTF注释文件，基因组版本尤为重要！！！</div>
<div></div>
<div>对NCBI：<span style="font-family: Arial,Helvetica,sans-serif;"><a href="ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/">ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/          ##最新版（hg38）</a></span></div>
<div><span style="font-family: Arial,Helvetica,sans-serif;"><a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/">ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/    ## 其它版本</a></span></div>
<div></div>
<div>对于ensembl：</div>
<div><a href="ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz" rel="nofollow">ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz</a></div>
<div>变幻中间的release就可以拿到所有版本信息：<a href="ftp://ftp.ensembl.org/pub/">ftp://ftp.ensembl.org/pub/</a></div>
<div>对于UCSC，那就有点麻烦了：</div>
<div>
<div>需要选择一系列参数：</div>
<div><a href="http://genome.ucsc.edu/cgi-bin/hgTables">http://genome.ucsc.edu/cgi-bin/hgTables</a></div>
<div></div>
<blockquote>
<div>1. Navigate to <a href="http://genome.ucsc.edu/cgi-bin/hgTables" target="_blank" rel="nofollow">http://genome.ucsc.edu/cgi-bin/hgTables</a></div>
<div></div>
<div>2. Select the following options:<br />
clade: Mammal<br />
genome: Human<br />
assembly: Feb. 2009 (GRCh37/hg19)<br />
group: Genes and Gene Predictions<br />
track: UCSC Genes<br />
table: knownGene<br />
region: Select "genome" for the entire genome.<br />
output format: GTF - gene transfer format<br />
output file: enter a file name to save your results to a file, or leave blank to display results in the browser</div>
<div></div>
<div>3. Click 'get output'.</div>
</blockquote>
</div>
<div> 现在重点来了，搞清楚版本关系了，就要下载呀！</div>
<div>UCSC里面下载非常方便，只需要根据基因组简称来拼接url即可：</div>
<div>
<blockquote>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz</a></div>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz</a></div>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz</a></div>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz</a></div>
</blockquote>
<div>或者用shell脚本指定下载的染色体号：</div>
<blockquote>
<div>for i in $(seq 1 22) X Y M;<br />
do echo $i;<br />
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr${i}.fa.gz;</div>
<div>## 这里也可以用NCBI的：ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/MGSCv3_Release3/Assembled_Chromosomes/chr前缀<br />
done<br />
gunzip *.gz<br />
for i in $(seq 1 22) X Y M;<br />
do cat chr${i}.fa &gt;&gt; hg19.fasta;<br />
done<br />
rm -fr chr*.fasta</div>
</blockquote>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1469.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>根据染色体起始终止点坐标来获取碱基序列</title>
		<link>http://www.bio-info-trainee.com/1049.html</link>
		<comments>http://www.bio-info-trainee.com/1049.html#comments</comments>
		<pubDate>Fri, 16 Oct 2015 11:21:27 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信基础]]></category>
		<category><![CDATA[UCSC]]></category>
		<category><![CDATA[基因组]]></category>
		<category><![CDATA[序列截取]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1049</guid>
		<description><![CDATA[这次要介绍一个非常实用的工具，很多时候，我们有一个染色体编号已经染色体起始终止为 &#8230; <a href="http://www.bio-info-trainee.com/1049.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div dir="ltr">
<div>这次要介绍一个非常实用的工具，很多时候，我们有一个染色体编号已经染色体起始终止为止，我们想知道这段序列是什么样的碱基。当然我们一般用去UCSC的genome browser里面去查询，而且可以得到非常多的信息，多到正常人根本就无法完全理解。但是我如果仅仅是想要一段序列呢？</div>
<div>诚然，我们可以下载3G的那个hg19.fa文件，然后写一个脚本去拿到序列，但是毕竟太麻烦，而且一般这种需求都是临时性的需要，我们当然想要一个非常简便的方法咯。</div>
<div>我这里介绍一个非常简单的方法，是基于perl的cgi编程，当然，不需要你编程了。人家UCSC已经写好了程序，你只需要把网页地址构造好即可，比如chr17:7676091,7676196 ，那么我只需要构造下面一个网页地址</div>
<div><a href="http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:7676091,7676196">http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:7676091,7676196</a></div>
<div>hg38可以更换成hg19，<a href="http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:7676091,7676196">dna?segment=</a> 后面可以按照标准格式更换，既可以返回我们想要的序列了。</div>
<div>网页会返回 一个xml格式的信息，解析一下即可。</div>
<div>This XML file does not appear to have any style information associated with it. The document tree is shown below.</div>
<div>
<div>&lt;DASDNA&gt;</div>
<div>&lt;SEQUENCE id="chr17" start="7676091" stop="7676196" version="1.00"&gt;</div>
<div>&lt;DNA length="106"&gt;</div>
<div>aggggccaggagggggctggtgcaggggccgccggtgtaggagctgctgg tgcaggggccacggggggagcagcctctggcattctgggagcttcatctg gacctg</div>
<div>&lt;/DNA&gt;</div>
<div>&lt;/SEQUENCE&gt;</div>
<div>&lt;/DASDNA&gt;</div>
<div>很明显里面的aggggccaggagggggctggtgcaggggccgccggtgtaggagctgctgg tgcaggggccacggggggagcagcctctggcattctgggagcttcatctg gacctg 就是我们想要的序列啦。</div>
<div>赶快去试一试吧</div>
<div>当然你不仅可以搜索DNA，还可以搜索很多其它的，你也不只是可以搜索人类的</div>
<div>See <a href="http://www.biodas.org/">http://www.biodas.org</a> for more info on DAS.<br />
Try <a href="http://genome.ucsc.edu/cgi-bin/das/dsn">http://genome.ucsc.edu/cgi-bin/das/dsn</a> for a list of databases.</div>
<div></div>
<div>
<pre>X-DAS-Version: DAS/0.95
X-DAS-Status: 200
Content-Type:text
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: X-DAS-Version X-DAS-Status X-DAS-Capabilities

UCSC DAS Server.
See <a href="http://www.biodas.org/">http://www.biodas.org</a> for more info on DAS.
Try <a href="http://genome.ucsc.edu/cgi-bin/das/dsn">http://genome.ucsc.edu/cgi-bin/das/dsn</a> for a list of databases.
See our DAS FAQ (<a href="http://genome.ucsc.edu/FAQ/FAQdownloads#download23">http://genome.ucsc.edu/FAQ/FAQdownloads#download23</a>)
for more information.  Alternatively, we also provide query capability
through our MySQL server; please see our FAQ for details
(<a href="http://genome.ucsc.edu/FAQ/FAQdownloads#download29">http://genome.ucsc.edu/FAQ/FAQdownloads#download29</a>).

Note that DAS is an inefficient protocol which does not support
all types of annotation in our database.  We recommend you
access the UCSC database by downloading the tab-separated files in
the downloads section (<a href="http://hgdownload.cse.ucsc.edu/downloads.html">http://hgdownload.cse.ucsc.edu/downloads.html</a>)
or by using the Table Browser (<a href="http://genome.ucsc.edu/cgi-bin/hgTables">http://genome.ucsc.edu/cgi-bin/hgTables</a>)
instead of DAS in most circumstances.</pre>
</div>
</div>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1049.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
