<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; CpG Islands</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/cpg-islands/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>CpG Islands记录文件下载的4种方式</title>
		<link>http://www.bio-info-trainee.com/2141.html</link>
		<comments>http://www.bio-info-trainee.com/2141.html#comments</comments>
		<pubDate>Thu, 15 Dec 2016 11:25:31 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[CpG Islands]]></category>
		<category><![CDATA[UCSC]]></category>
		<category><![CDATA[坐标记录]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2141</guid>
		<description><![CDATA[这个也是读者来信最多的，关于基因组某些区域的起始终止坐标的下载问题，genomi &#8230; <a href="http://www.bio-info-trainee.com/2141.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这个也是读者来信最多的，关于基因组某些区域的起始终止坐标的下载问题，genomic feature的问题，一般是gtf文件或者bed文件，比如人类hg19上面的所有外显子的坐标记录文件，所有基因的坐标记录文件，所有lncRNA，rRNA等等，我这里拿CpG Islands记录文件下载的4种方式举例子给大家说明一下：<span id="more-2141"></span></p>
<div>自己先理解几个概念：CpGI, CpG Shore, CpG shelf regions</div>
<div>最简单的首推UCSC的table browser(<a href="https://genome-euro.ucsc.edu/cgi-bin/hgTables">https://genome-euro.ucsc.edu/cgi-bin/hgTables</a>)，而且以BED格式文件格式输出(是普通的文本数据)</div>
<div>BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in an annotation track</div>
<div>下面是一个简单的实例，获取mm10的 CpG island 的坐标记录文件，根据你的需求，实时创建一个文件：</div>
<div>
<div>如果你足够聪明的话，应该明白，上面的选项任意组合，是可以现在各种记录文件的，包括基因的坐标，外显子的坐标，转录本的坐标，等等。</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/31.png"><img class="alignnone size-full wp-image-2142" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/31.png" alt="3" width="725" height="502" /></a></div>
<div>然后就是直接去ftp网站里面寻找文件下载， <a href="http://hgdownload.soe.ucsc.edu/downloads.html">http://hgdownload.soe.ucsc.edu/downloads.html</a>. Click on "Human" then "Annotation Database", and finally "cpgIslandExt.txt.gz" 其实就是修改url即可：</div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/">http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/</a></div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/">http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/</a></div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/">http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/</a></div>
<div><a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/</a></div>
<div>在里面搜索文件即可，可以看到，两种方法下载的数据是一样的，而且mouse已知的cpgIsland，要比人类少很多，应该是mouse的研究不够透彻</div>
<div></div>
<div>当然ensembl数据库的biomart界面也可以做同样的事情，<br />
最后，biomart还有一个biomaRt的R包也可以。</div>
<div>4种方法，就讲解完毕啦！</div>
<div>另外，强烈推荐R里面的genomic features相关的包，非常好学，学完了受益无穷！~~</div>
<div>本质上，就是理解TxDb和GenomicRanges对象而已。</div>
<blockquote>
<div>##　https://www.bioconductor.org/packages/devel/data/annotation/?TxDb<br />
?GenomicRanges</p>
<p>library(TxDb.Mmusculus.UCSC.mm10.knownGene)<br />
library(TxDb.Hsapiens.UCSC.hg19.knownGene)<br />
library(EnsDb.Hsapiens.v75)<br />
library(EnsDb.Mmusculus.v79)<br />
ls('package:EnsDb.Mmusculus.v79')</p>
<p>library(BSgenome.Hsapiens.UCSC.hg19.masked)<br />
library(BSgenome.Hsapiens.UCSC.hg19)</p>
<p>library(EnsDb.Hsapiens.v75)<br />
annoData &lt;- genes(EnsDb.Mmusculus.v79)<br />
annoData[1:2];length(annoData)<br />
ranges(annoData[1:2])</p>
<p>txdb &lt;- TxDb.Mmusculus.UCSC.mm10.knownGene<br />
txdb_dump &lt;- as.list(txdb)<br />
txdb_dump$genes</p>
</div>
<div></div>
</blockquote>
<div>subtract 2000bp and add 2000 to the CpG island region to get CpG shore regions</div>
<div></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2141.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
