生信菜鸟团 » UCSC

CpG Islands记录文件下载的4种方式

ulwvfje — Thu, 15 Dec 2016 11:25:31 +0000

这个也是读者来信最多的，关于基因组某些区域的起始终止坐标的下载问题，genomic feature的问题，一般是gtf文件或者bed文件，比如人类hg19上面的所有外显子的坐标记录文件，所有基因的坐标记录文件，所有lncRNA，rRNA等等，我这里拿CpG Islands记录文件下载的4种方式举例子给大家说明一下：

自己先理解几个概念：CpGI, CpG Shore, CpG shelf regions

最简单的首推UCSC的table browser(https://genome-euro.ucsc.edu/cgi-bin/hgTables)，而且以BED格式文件格式输出(是普通的文本数据)

BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in an annotation track

下面是一个简单的实例，获取mm10的 CpG island 的坐标记录文件，根据你的需求，实时创建一个文件：

如果你足够聪明的话，应该明白，上面的选项任意组合，是可以现在各种记录文件的，包括基因的坐标，外显子的坐标，转录本的坐标，等等。

然后就是直接去ftp网站里面寻找文件下载， http://hgdownload.soe.ucsc.edu/downloads.html. Click on "Human" then "Annotation Database", and finally "cpgIslandExt.txt.gz" 其实就是修改url即可：

http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/

http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/

http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/

在里面搜索文件即可，可以看到，两种方法下载的数据是一样的，而且mouse已知的cpgIsland，要比人类少很多，应该是mouse的研究不够透彻

当然ensembl数据库的biomart界面也可以做同样的事情，
最后，biomart还有一个biomaRt的R包也可以。

4种方法，就讲解完毕啦！

另外，强烈推荐R里面的genomic features相关的包，非常好学，学完了受益无穷！~~

本质上，就是理解TxDb和GenomicRanges对象而已。

##　https://www.bioconductor.org/packages/devel/data/annotation/?TxDb
?GenomicRanges

library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(EnsDb.Hsapiens.v75)
library(EnsDb.Mmusculus.v79)
ls('package:EnsDb.Mmusculus.v79')

library(BSgenome.Hsapiens.UCSC.hg19.masked)
library(BSgenome.Hsapiens.UCSC.hg19)

library(EnsDb.Hsapiens.v75)
annoData <- genes(EnsDb.Mmusculus.v79)
annoData[1:2];length(annoData)
ranges(annoData[1:2])

txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
txdb_dump <- as.list(txdb)
txdb_dump$genes

subtract 2000bp and add 2000 to the CpG island region to get CpG shore regions

6种方式下载ENCODE计划的所有数据

ulwvfje — Thu, 28 Jul 2016 14:50:00 +0000

DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)ENCODE计划的重要性我就不多说了，如果大家还不是很了解，可以直接跳到本文末尾去下载一下ENCODE教程，好好学习。该计划采用以下几种高通量测序技术来刻画了超过100种不同的细胞系或者组织内的全基因组范围内的基因调控元件信息。本来只是针对人类的，后来对mouse以及fly等模式生物也开始测这些数据并进行分析了，叫做 modENCODE

chromatin structure (5C)

open chromatin (DNase-seq and FAIRE-seq)

histone modifications and DNA-binding of over 100 transcription factors (ChIP-seq)

RNA transcription (RNAseq and CAGE)

目前所有数据均全部公开(http://genome.ucsc.edu/ENCODE/ )，ENCODE results from 2007 and later are available from the ENCODE Project Portal, encodeproject.org. 并以30篇论文在Nature、Science、Cell、JBC、Genome Biol、Genome Research同时发表(http://www.nature.com/encode )。

所有数据从raw data形式的原始测序数据到比对后的信号文件以及分析好的有意的peaks文件都可以下载。

我这里根据自己的学习情况，简单介绍一些ENCODE计划数据下载方式，包括ENCODE官网下载,UCSC下载，ENSEMBL下载，broad研究所数据，IHEC存放的数据，还有GEO下载这6种形式！！！

首先在UCSC里面：

网址是：http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/ 因为是直接浏览文件，根据文件夹分类及文件名就可以任意方式下载自己感兴趣的数据啦，所以最对我胃口。

大家可能会比较习惯用UCSC提供的Genome Browser工具来可视化CHIP-seq的结果，而且Genome Browser里面非常多的选项可以控制各种在线资料是否跟你的数据一起显示来做对比，所以它必然有ftp服务器存放这些数据，其中比较出名的就是ENCODE计划的相关数据啦！如下图所示：

我比较关注ENCODE计划的组蛋白数据，点击进入！

一般都是

每个细胞系对应的各个组蛋白标记物的数据，从测序序列到比对bam文件，以及call到的peaks都可以下载！！！

然后是ENCODE计划的官网下载：

在ENCODE计划的官网上面还有各种数据处理的流程介绍：https://www.encodeproject.org/pipelines/

RNA-seq pipelines

RAMPAGE pipeline

Chromatin pipelines(Histone ChIP-seq Pipeline/Transcription Factor ChIP-seq Pipeline)

Methylation pipeline(WGBS Pipeline Overview)

官网的数据下载，做得像是一个购物网站，大家可以根据自己的需求把数据添加到购物篮，然后统一下载。

This document describes what data are available at the ENCODE Portal, ways to get started searching and downloading data, and an overview to how the metadata describing the assays and reagents are organized. ENCODE data can be visualized and accessed from other resources, including the UCSC Genome Browser and ENSEMBL.

进入 https://www.encodeproject.org/matrix/?type=Experiment 可以看到里面列出了173种细胞系，148种组织，还有一堆癌症样本的，包括CHIP-seq，DNase-seq等在内的十几种高通量测序数据。

接下来是GEO数据库里面：

里面直接把所有跟ENCODE相关的GSE study列出来了：http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html

GEO数据就没什么好说的了，直接进入study页面，然后下载数据即可，这也是我比较喜欢的数据下载方式，因为GEO里面对一个实验的描述很详细。

然后是broad 研究所托管的ENCODE计划的数据:

大名鼎鼎的broad研究所貌似是生物信息最全面的资源站点了，它不仅host了ENCODE计划的所有数据，还有它分析ENCODE计划的数据时使用的软件，工具。

http://www.broadinstitute.org/~anshul/projects/encode

原始数据在：http://www.broadinstitute.org/~anshul/projects/encode/rawdata/

接着是 iHEC存放的数据：

http://epigenomesportal.ca/ihec/download.html

我还是第一次看到这个数据接口，也是以文件夹文件的形式直接浏览，根据自己的需求下载即可：

除了ENCODE计划的数据，还有Blueprint计划和roadmap计划的数据都可以下载。


CEEHRC	2014-09-18	Click here for policies
Blueprint	2014-08-11	Click here for policies
ENCODE	2011-01	Click here for policies
NIH Roadmap	2014-05-29	Click here for policies
DEEP	2014-08-15	Click here for policies
CREST JST	2014-09-12	Click here for policies
KNIH	2015-07-15	Click here for policies

最后就是ENSEMBL数据库里面的：

我没有找到直接下载地址；http://asia.ensembl.org/info/website/tutorials/encode.html

The full ENCODE datasets that were used in the Ensembl regulatory build can also be viewed in the Ensembl GrCh37 archive, by attaching a track hub to Region in Detail - the link below will do this automatically:

Link to add ENCODE integrative analysis hub

This creates a menu in the Control Panel on Region in Detail, from which you can add individual tracks or groups of tracks using matrix selectors. Cell type and experimental factor are the two principal axes; other dimensions can be selected by clicking on a box to open an additional submenu (see below).

如果你对ENCODE计划不是很了解，可以先看看一些教程：

NIH提供的ENCODE计划相关教程： https://www.genome.gov/27553900/encode-tutorials/

https://www.genome.gov/27562350/encode-workshop-april-2015-keystone-symposia/

https://www.genome.gov/27561253/encode-workshop-tutorial-october-2014-ashg/

https://www.genome.gov/27553901/encode-tutorial-may-2013-biology-of-genomes-cshl/

https://www.genome.gov/27563006/encoderoadmap-epigenomics-tutorial-october-2015-ashg/

https://www.genome.gov/27555330/encoderoadmap-epigenomics-tutorial-october-2013-ashg/

https://www.genome.gov/27551933/encoderoadmap-epigenomics-tutorial-nov-2012-ashg/

http://useast.ensembl.org/info/website/tutorials/encode.html

https://www.encodeproject.org/tutorials/

https://www.encodeproject.org/tutorials/encode-meeting-2016/

https://www.encodeproject.org/tutorials/encode-users-meeting-2015/

DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)项目旨在描述人类基因组中所编码的全部功能性序列元件。ENCODE计划于2003年9月正式启动，吸引了来自美国、英国、西班牙、日本和新加坡五国32个研究机构的440多名研究人员的参与，经过了9年的努力，研究了147个组织类型，进行了1478次实验，获得并分析了超过15万亿字节的原始数据，确定了400万个基因开关，明确了哪些DNA片段能打开或关闭特定的基因，以及不同类型细胞之间的“开关”存在的差异。证明所谓“垃圾DNA”都是十分有用的基因成分，担任着基因调控重任。证明人体内没有一个DNA片段是无用的。

用UCSC提供的Genome Browser工具来可视化customTrack

ulwvfje — Tue, 26 Jul 2016 14:59:09 +0000

customTrack，我这里翻译为自定义的测序片段示踪文件，可以追踪我们的reads到底比对到了参加基因组的什么区域，或者追踪参考基因组的各个区域的覆盖度，测序深度！翻译自：http://genome.ucsc.edu/goldenPath/help/customTrack.html 这个非常有用！！！

UCSC提供的Genome Browser工具非常好用，可以很方便的浏览我们的测序数据在参考基因组的比对情况，由于定义好了一系列track的文件格式，用户可以非常方便的上传自己的track文件，但是如果用户超过48小时没有浏览自己的数据，UCSC会默认删除掉这些数据，除非用户已经保存在session里面。或者用户可以分享这些自定义的reads示踪文件customTrack。

UCSC已经提供了一系列customTrack的例子：click the Custom Tracks link

这些自定义的Track文件保密性非常好，如果用户感兴趣，可以按照以下4个步骤来操作：

Step 1. Format the data set

我们支持非常多的Track文件格式，尤其是标准的GFF文件，还包括：bedGraph, GTF, PSL, BED, bigBed, WIG, bigGenePred, bigMaf, bigChain, bigPsl, bigWig, BAM,CRAM, VCF, MAF, BED detail, Personal Genome SNP, broadPeak, narrowPeak, and microarray (BED15).

染色体一定是chrN 类型的标记，大小写敏感！也支持多种或者多个annotation的track文件。

Step 2. Define the Genome Browser display characteristics

设置浏览器选项，是否在Genome Browser里面显示UCSC的其它数据类型，包括hide/dense/pack/squish/full各种选项，包括ENCODE计划等各种公共数据是否需要显示。Add one or more optional browser lines to the beginning of your formatted data file to configure the overall display of the Genome Browser when it initially shows your annotation data.

这个非常复杂，但是一般就定义有限的几个属性即可。

Step 3. Define the annotation track display characteristics

设置如何显示自己的数据，包括颜色，数据名，数据描述情况。Following the browser lines--and immediately preceding the formatted data--add a track line to define the display attributes for your annotation data set.

下面这幅图里面的一些track的颜色，形状，注释，都是可以设置的，设置规则需要自己详细读说明书啦。

Step 4. Display your annotation track in the Genome Browser

重点就是上传自己的文件，步骤是：

open the Genome Browser home page ,click the Genome Browser link in the top menu bar.

On the Gateway page that displays, select the genome and assembly on which your annotation data is based, then click the "add custom tracks" button.

看到下面的图片的链接，点进去就好啦

On the Add Custom Tracks page, load the annotation track data or URL for your custom track into the upper text box and the track documentation (optional) into the lower text box, then click the Submit button. Tracks may be loaded by entering text, a URL, or a pathname on your local computer.

用户可以提交多种格式的自定义track文件

see Loading a Custom Track into the Genome Browser.

提交完毕之后，直接回到 Genome Browser 页面就可以看到了，这个工具不默认跳转。

Step 5. (Optional) Add details pages for individual track features

Step 6. (Optional) Share your annotation track with others

这是可选的步骤，自己去探索：read the section Sharing Your Annotation Track with Others.

我这里添加了一个UCSC也提供的一个wig文件：http://genome.ucsc.edu/goldenPath/help/examples/wiggleExample.txt 作为测试例子，显示如下：

wig、bigWig和bedgraph文件详解

ulwvfje — Tue, 26 Jul 2016 14:53:16 +0000

我们一般会熟悉sam/bam格式文件，就是把测序reads比对到参考基因组后的文件！bam或者bed格式的文件主要是为了追踪我们的reads到底比对到了参加基因组的什么区域，而UCSC规定的这几个文件格式(wig、bigWig和bedgraph)用处不一样，仅仅是为了追踪参考基因组的各个区域的覆盖度，测序深度！而且这些定义好的文件，可以无缝连接到UCSC的Genome Browser工具里面进行可视化！

这个网站提供了这几种数据格式的构造及转换脚本：http://barcwiki.wi.mit.edu/wiki/SOPs/coordinates

对SE数据，可以用macs2 pileup --extsize 200 -i $sample.bam -o $sample.bdg 把bam文件转换为bedgraph文件，不需要call peaks这一步骤。

而UCSC的ftp里面可以下载bedGraphToBigWig $sample.bdg ~/reference/genome/mm10/mm10.chrom.sizes $sample.bw 把bedgraph文件转换为bw文件，其余的转换工具都可以下载。

具体文件格式定义请直接看UCSC的官网，下面是我基于自己的理解来翻译的，没什么特殊的，建议大家看原文，然后自己翻译一个，跟我比较！

Wiggle Track Format (WIG)：http://genome.ucsc.edu/goldenPath/help/wiggle.html

bigWig Track Format ：http://genome.ucsc.edu/goldenPath/help/bigWig.html

BedGraph Track Format ：http://genome.ucsc.edu/goldenPath/help/bedgraph.html

这3种文件格式都是UCSC规定的，所以它提供了系列工具进行互相转换，可以直接下载可执行版本程序：http://hgdownload.cse.ucsc.edu/admin/exe/

常见的工具如下：

bigWigToBedGraph — this program converts a bigWig file to ASCII bedGraph format.
bigWigToWig — this program converts a bigWig file to wig format.
bigWigSummary — this program extracts summary information from a bigWig file.
bigWigAverageOverBed — this program computes the average score of a bigWig over each bed, which may have introns.
bigWigInfo — this program prints out information about a bigWig file.

其实对我们的bam文件，用samtools软件也可以很容易得到基因组区域的覆盖度和测序深度，比如：

samtools depth -r chr12:126073855-126073965 Ip.sorted.bam

chr12    126073855    5

chr12    126073856    15

chr12    126073857    31

chr12    126073858    40

chr12    126073859    44

chr12    126073860    52

~~~~~~~~~其余省略输出~~~~~~~~~

这其实就是wig文件的雏形，但是wig文件会更复杂一点！

首先它不需要第一列了，因为全部是重复字段，只需要在每个染色体的第一行定义好染色体即可。

首先需要设置这个wig文件在UCSC的Genome Browser工具里面显示的属性：

track type=wiggle_0 name=track_labeldescription=center_labelvisibility=display_modecolor=r,g,baltColor=r,g,bpriority=priorityautoScale=on|offalwaysZero=on|offgridDefault=on|offmaxHeightPixels=max:default:mingraphType=bar|pointsviewLimits=lower:upperyLineMark=real-valueyLineOnOff=on|offwindowingFunction=mean+whiskers|maximum|mean|minimumsmoothingWindow=off|2-16

type=wiggle_0 这个是默认的， 而且到目前为止，必须是这样的！其余的都是可选参数，自己读官网说明

这些参数一般不用管，除非你很熟悉了UCSC的Genome Browser工具

然后需要设置每条染色体的属性，几个比较重要的参数是：

fixedStepchrom=chrNstart=positionstep=stepInterval[span=windowSize]

下面是wig的一个具体例子：

track type=print wiggle_0 name=hek  description=hek

variableStep chrom=chr1 span=10

10008    7

10018    14

10028    27

10038    37

10048    45

10058    43

10068    37

10078    26

~~~~~~~~~其余省略输出~~~~~~~~~

UCSC也提供了一个wig文件：http://genome.ucsc.edu/goldenPath/help/examples/wiggleExample.txt

可以看到我设置的参数很少很少，而且我是直接对sort后的bam文件用脚本变成wig文件的。

那么bigwig格式文件就没什么好讲的了，它就是wig格式文件的二进制压缩版本，这样更加节省空间。

我们只需要用UCSC提供的工具把自己的wig文件转换一下即可，步骤如下：

Save this wiggle file to your machine (this satisfies steps 1 and 2 above).

Save this text file to your machine. It contains the chrom.sizes for the human (hg19) assembly (this satisfies step 4 above).

Download the wigToBigWig utility (see step 3).

Run the utility to create the bigWig output file (see step 5):
wigToBigWig wigVarStepExample.gz hg19.chrom.sizes myBigWig.bw

最后我们讲一下BedGraph格式文件，它是BED文件的扩展，是4列的BED格式，但是需要添加UCSC的Genome Browser工具里面显示的属性，但是一般就定义有限的几个属性即可。

track type=bedGraph name=track_labeldescription=center_label        visibility=display_modecolor=r,g,baltColor=r,g,b        priority=priorityautoScale=on|offalwaysZero=on|off        gridDefault=on|offmaxHeightPixels=max:default:min        graphType=bar|pointsviewLimits=lower:upper        yLineMark=real-valueyLineOnOff=on|off        windowingFunction=maximum|mean|minimumsmoothingWindow=off|2-16

有一点需要注意： These coordinates are zero-based, half-open.

Chromosome positions are specified as 0-relative. The first chromosome position is 0. The last position in a chromosome of length N would be N - 1. Only positions specified have data.

Positions not specified do not have data and will not be graphed.

All positions specified in the input data must be in numerical order.

我这里有一个MACS对CHIP-seq数据call peaks附带的BedGraph文件，也可以用工具直接从bam格式文件得到：

track type=bedGraph name="hek_treat_all" description="Extended tag pileup from MACS version 1.4.2 20120305"

chr1    9997    9999    1

chr1    9999    10000   2

chr1    10000   10001   4

chr1    10001   10003   5

chr1    10003   10007   6

chr1    10007   10010   7

chr1    10010   10012   8

chr1    10012   10015   9

chr1    10015   10016   10

chr1    10016   10017   11

chr1    10017   10018   12

基因组各种版本对应关系

ulwvfje — Tue, 15 Mar 2016 11:50:00 +0000

我是受到了SOAPfuse的启发才想到整理各种基因组版本的对应关系，完整版！！！

以后再也不用担心各种基因组版本混乱了，我还特意把所有的下载链接都找到了，可以下载任意版本基因组的基因fasta文件，gtf注释文件等等！！！

首先是NCBI对应UCSC，对应ENSEMBL数据库：

GRCh36 (hg18): ENSEMBL release_52.

GRCh37 (hg19): ENSEMBL release_59/61/64/68/69/75.

GRCh38 (hg38): ENSEMBL release_76/77/78/80/81/82.

可以看到ENSEMBL的版本特别复杂！！！很容易搞混！

但是UCSC的版本就简单了，就hg18,19,38, 常用的是hg19，但是我推荐大家都转为hg38

看起来NCBI也是很简单，就GRCh36,37,38，但是里面水也很深！

Feb 13 2014 00:00    Directory April_14_2003
Apr 06 2006 00:00    Directory BUILD.33
Apr 06 2006 00:00    Directory BUILD.34.1
Apr 06 2006 00:00    Directory BUILD.34.2
Apr 06 2006 00:00    Directory BUILD.34.3
Apr 06 2006 00:00    Directory BUILD.35.1
Aug 03 2009 00:00    Directory BUILD.36.1
Aug 03 2009 00:00    Directory BUILD.36.2
Sep 04 2012 00:00    Directory BUILD.36.3
Jun 30 2011 00:00    Directory BUILD.37.1
Sep 07 2011 00:00    Directory BUILD.37.2
Dec 12 2012 00:00    Directory BUILD.37.3

可以看到，有37.1, 37.2， 37.3 等等，不过这种版本一般指的是注释在更新，基因组序列一般不会更新！！！

反正你记住hg19基因组大小是3G，压缩后八九百兆即可！！！

如果要下载GTF注释文件，基因组版本尤为重要！！！

对NCBI：ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/ ##最新版（hg38）

ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/ ## 其它版本

对于ensembl：

ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz

变幻中间的release就可以拿到所有版本信息：ftp://ftp.ensembl.org/pub/

对于UCSC，那就有点麻烦了：

需要选择一系列参数：

http://genome.ucsc.edu/cgi-bin/hgTables

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes
table: knownGene
region: Select "genome" for the entire genome.
output format: GTF - gene transfer format
output file: enter a file name to save your results to a file, or leave blank to display results in the browser

3. Click 'get output'.

现在重点来了，搞清楚版本关系了，就要下载呀！

UCSC里面下载非常方便，只需要根据基因组简称来拼接url即可：

http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz

http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz

或者用shell脚本指定下载的染色体号：

for i in $(seq 1 22) X Y M;
do echo $i;
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr${i}.fa.gz;

## 这里也可以用NCBI的：ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/MGSCv3_Release3/Assembled_Chromosomes/chr前缀
done
gunzip *.gz
for i in $(seq 1 22) X Y M;
do cat chr${i}.fa >> hg19.fasta;
done
rm -fr chr*.fasta

根据染色体起始终止点坐标来获取碱基序列

ulwvfje — Fri, 16 Oct 2015 11:21:27 +0000

这次要介绍一个非常实用的工具，很多时候，我们有一个染色体编号已经染色体起始终止为止，我们想知道这段序列是什么样的碱基。当然我们一般用去UCSC的genome browser里面去查询，而且可以得到非常多的信息，多到正常人根本就无法完全理解。但是我如果仅仅是想要一段序列呢？

诚然，我们可以下载3G的那个hg19.fa文件，然后写一个脚本去拿到序列，但是毕竟太麻烦，而且一般这种需求都是临时性的需要，我们当然想要一个非常简便的方法咯。

我这里介绍一个非常简单的方法，是基于perl的cgi编程，当然，不需要你编程了。人家UCSC已经写好了程序，你只需要把网页地址构造好即可，比如chr17:7676091,7676196 ，那么我只需要构造下面一个网页地址

http://genome.ucsc.edu/cgi-bin/das/hg38/dna?segment=chr17:7676091,7676196

hg38可以更换成hg19，dna?segment= 后面可以按照标准格式更换，既可以返回我们想要的序列了。

网页会返回一个xml格式的信息，解析一下即可。

This XML file does not appear to have any style information associated with it. The document tree is shown below.

aggggccaggagggggctggtgcaggggccgccggtgtaggagctgctgg tgcaggggccacggggggagcagcctctggcattctgggagcttcatctg gacctg

很明显里面的aggggccaggagggggctggtgcaggggccgccggtgtaggagctgctgg tgcaggggccacggggggagcagcctctggcattctgggagcttcatctg gacctg 就是我们想要的序列啦。

赶快去试一试吧

当然你不仅可以搜索DNA，还可以搜索很多其它的，你也不只是可以搜索人类的

See http://www.biodas.org for more info on DAS.
Try http://genome.ucsc.edu/cgi-bin/das/dsn for a list of databases.

X-DAS-Version: DAS/0.95
X-DAS-Status: 200
Content-Type:text
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: X-DAS-Version X-DAS-Status X-DAS-Capabilities

UCSC DAS Server.
See http://www.biodas.org for more info on DAS.
Try http://genome.ucsc.edu/cgi-bin/das/dsn for a list of databases.
See our DAS FAQ (http://genome.ucsc.edu/FAQ/FAQdownloads#download23)
for more information.  Alternatively, we also provide query capability
through our MySQL server; please see our FAQ for details
(http://genome.ucsc.edu/FAQ/FAQdownloads#download29).

Note that DAS is an inefficient protocol which does not support
all types of annotation in our database.  We recommend you
access the UCSC database by downloading the tab-separated files in
the downloads section (http://hgdownload.cse.ucsc.edu/downloads.html)
or by using the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables)
instead of DAS in most circumstances.