用网页版工具GREAT来对CHIP-seq的peaks进行下游功能分析

ulwvfje — Thu, 07 Jul 2016 12:57:16 +0000

一般做完一个CHIP-seq测序，如果实验设计没有问题，测序质量也OK的话，很容易了根据序列call到符合要求的peaks，或者可以去很多文章或者roadmap里面下载到非常多有意义的peaks文件，一般是BED格式文件，这是就需要对这些peaks进行各种各样的注释以及可视化了，还有根据peaks相关的基因可以做各种各样的下游分析，包括各种pathway数据库的富集，MsigDB数据库注释，gene ontology的注释等等，此时不得不强烈推荐一款网页版工具，是斯坦福大学的学者开发的GREAT。

此工具的出现主要是为了解决基因组上面的非编码区域注释缺乏的问题，而我们CHIP-seq实验得到的peaks结果通常就是在非编码区域

首先进入该工具主页：http://bejerano.stanford.edu/great/public/html/

该工具每次只能上传一个文件，就是我们call出来的peaks记录文件，支持bed格式的：

一般很快就可以出结果啦！

首先会有三个图，都是很常见的，大家随便看看咯

Number of associated genes per region

Binned by orientation and distance to TSS

Binned by absolute distance to TSS

然后就是pathway和GO注释啦

这个网站提供的pathway非常之多，还是蛮全面的，包括KEGG，biocarta,reactome,msigdb等等还有一些signature和gene families，相当于一站式完成了大部分下游分析

GO Molecular Function (no terms)

GO Biological Process (no terms)

GO Cellular Component (no terms)

The test set of 5,225 genomic regions picked 2,992 (17%) of all 18,041 genes.
GO Molecular Function has 3,688 terms covering 15,090 (84%) of all 18,041 genes, and 189,388 term - gene associations.

3,688 ontology terms (100%) were tested using an annotation count range of [1, Inf].

The test set of 5,225 genomic regions picked 2,992 (17%) of all 18,041 genes.
GO Biological Process has 10,440 terms covering 15,441 (86%) of all 18,041 genes, and 950,065 term - gene associations.

10,440 ontology terms (100%) were tested using an annotation count range of [1, Inf].

The test set of 5,225 genomic regions picked 2,992 (17%) of all 18,041 genes.
GO Biological Process has 10,440 terms covering 15,441 (86%) of all 18,041 genes, and 950,065 term - gene associations.

10,440 ontology terms (100%) were tested using an annotation count range of [1, Inf].

Mouse Phenotype (no terms)

Human Phenotype (no terms)

Disease Ontology (no terms)

MSigDB Cancer Neighborhood (no terms)

Placenta Disorders (no terms)

PANTHER Pathway (no terms)

BioCyc Pathway (no terms)

MSigDB Pathway (no terms)

MGI Expression: Detected (no terms)

MSigDB Perturbation (no terms)

MSigDB Predicted Promoter Motifs (no terms)

MSigDB miRNA Motifs (no terms)

InterPro (no terms)

HGNC Gene Families (no terms)

MSigDB Oncogenic Signatures (no terms)

MSigDB Immunologic Signatures (no terms)

The test set of 5,225 genomic regions picked 2,992 (17%) of all 18,041 genes.
MSigDB Immunologic Signatures has 1,910 terms covering 16,609 (92%) of all 18,041 genes, and 363,333 term - gene associations.

1,910 ontology terms (100%) were tested using an annotation count range of [1, Inf].

用网页版工具ChIPseek来可视化CHIP-seq的peaks结果

ulwvfje — Thu, 07 Jul 2016 12:56:10 +0000

一般做完一个CHIP-seq测序，如果实验设计没有问题，测序质量也OK的话，很容易了根据序列call到符合要求的peaks，或者可以去很多文章或者roadmap里面下载到非常多有意义的peaks文件，一般是BED格式文件，这是就需要对这些peaks进行各种各样的注释以及可视化了，此时不得不强烈推荐一款网页版工具，是台湾学者开发的ChIPseek：

该工具首页就show了8张图片，就说明了该软件的功能：http://chipseek.cgu.edu.tw/index_show.py

该工具本质是就是后台调用 HOMER 和BEDTools, 这两个软件，使得那些不会编程的生物学家可以更方便快捷的理解自己的CHIP-seq结果，功能包括：

annotate the peaks
link to UCSC genome browser
provide pie charts, histograms and bar charts for peak location distribution
apply filter criteria by peak length to get a subset of peaks
apply filter criteria by distance to nearest TSS to get a subset of peaks
apply filter criteria by location of the peaks
apply filter criteria by list(s) of genes
apply filter criteria by GO terms
apply filter criteria by KEGG pathway annotations
compare two datasets
compare dataset with ENCODE transcription factor dataset
identify enriched motif
plot peaks on chromosome ideograms
allow users to download figures or tables

大部分功能自己写脚本也能实现，我就不多说了。

使用方法非常简单：

首先进入分析界面：http://chipseek.cgu.edu.tw/analysis_form.php

然后上传自己想要分析的peaks文件

比如GSE50177里面的GSE50177_RAW.tar：http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50177

我拿了四个peaks文件测试了一下：

提交任务后，文件就会上传，然后网页会给一个job ID号，如果你是在一个月之内看到这篇文章，你可以直接拿我的ID号去看结果，不需要自己上传自己的文件了，当然，你肯定是需要分析自己的peaks结果的。

ChIPseek is annotating your file(s).

This page will automatically refresh every 60 seconds.

Alternatively, You may use the job ID: 1467890358.407 to visit ChIPseek latter.

一会儿就可以看到结果了，因为网页版工具的服务器容量有限，所以这个结果一个月内是有效的。

http://chipseek.cgu.edu.tw/main_menu.py?job_id=1467890358.407

GSM1278641_Xu_MUT_rep1_BAF155_MUT (a total of 6733 peaks) (Download all annotation results)

GSM1278643_Xu_MUT_rep2_BAF155_MUT (a total of 3625 peaks) (Download all annotation results)

GSM1278645_Xu_WT_rep1_BAF155 (a total of 10987 peaks) (Download all annotation results)

GSM1278647_Xu_WT_rep2_BAF155 (a total of 5225 peaks) (Download all annotation results)

把每个文件的每个peaks都注释了，而且提供带链接的下载结果，tab分割的纯文本文件，用excel打开可能看起来舒服一点

还有4个可视化图片是我们可能会比较感兴趣的：

Peak location (pie chart)

Peak location (bar chart)

Distance to TSS

Peak length distribution

以及它可以把我们上传的bed格式peaks区域文件转为fasta序列 Peak sequences

本质是根据坐标从参考基因组里面提取序列而已，我把所有的序列都下载下来了，可以用来直接做motif查找

$ ls -lh *fasta

-rw-r–r– 1 Jimmy 197121 18M Jul 7 19:40 GSM1278641_Xu_MUT_rep1_BAF155_MUT_sequence.fasta

-rw-r–r– 1 Jimmy 197121 9.9M Jul 7 19:38 GSM1278643_Xu_MUT_rep2_BAF155_MUT_sequence.fasta

-rw-r–r– 1 Jimmy 197121 26M Jul 7 19:41 GSM1278645_Xu_WT_rep1_BAF155_sequence.fasta

-rw-r–r– 1 Jimmy 197121 14M Jul 7 19:41 GSM1278647_Xu_WT_rep2_BAF155_sequence.fasta

生信菜鸟团 » 表观遗传学

用网页版工具GREAT来对CHIP-seq的peaks进行下游功能分析

GO Molecular Function (no terms)

GO Biological Process (no terms)

GO Cellular Component (no terms)

Mouse Phenotype (no terms)

Human Phenotype (no terms)

Disease Ontology (no terms)

MSigDB Cancer Neighborhood (no terms)

Placenta Disorders (no terms)

PANTHER Pathway (no terms)

BioCyc Pathway (no terms)

MSigDB Pathway (no terms)

MGI Expression: Detected (no terms)

MSigDB Perturbation (no terms)

MSigDB Predicted Promoter Motifs (no terms)

MSigDB miRNA Motifs (no terms)

InterPro (no terms)

InterPro (no terms)

HGNC Gene Families (no terms)

MSigDB Oncogenic Signatures (no terms)

MSigDB Immunologic Signatures (no terms)

用网页版工具ChIPseek来可视化CHIP-seq的peaks结果