对象何必到处乱找,自己创造即可

以下代码需要加载 GSEABase 包:

读取gmt文件看看GeneSetCollection 对象

自己去下载 h.all.v7.2.symbols.gmt 文件哦,在msigdb数据库可以找到。

代码如下:

gmtfile ='../MSigDB/symbols/h.all.v7.2.symbols.gmt'
geneset <- getGmt( gmtfile ) 
geneset

得到了 GeneSetCollection 对象 :

> geneset
GeneSetCollection
 names: HALLMARK_TNFA_SIGNALING_VIA_NFKB, HALLMARK_HYPOXIA, ..., HALLMARK_PANCREAS_BETA_CELLS (50 total)
 unique identifiers: JUNB, CXCL2, ..., SRP14 (4383 total)
 types in collection:
 geneIdType: NullIdentifier (1 total)
 collectionType: NullCollection (1 total)

起初看到 GeneSetCollection 对象肯定是一脸懵逼的,不过我们可以慢慢来熟悉这个 GeneSetCollection 对象哦。

看看 GeneSetCollection 的帮助文档

GeneSetCollection-methods {GSEABase} R Documentation

Methods to construct GeneSetCollection instances

Description
Use GeneSetCollection to construct a collection of gene sets from GeneSet arguments, or a list of GeneSets.

Usage
GeneSetCollection(object, ..., idType, setType)

附带一个很精彩的例子:

## Not run: 
## from KEGG identifiers, for example
library(KEGG.db)
lst <- head(as.list(KEGGEXTID2PATHID))
gsc <- GeneSetCollection(mapply(function(geneIds, keggId) {
 GeneSet(geneIds, geneIdType=EntrezIdentifier(),
 collectionType=KEGGCollection(keggId),
 setName=keggId)
}, lst, names(lst)))

也就是说,只需要自己构建好一个list,如下所示:

> lst
$`10`
[1] "hsa00232" "hsa00983" "hsa01100"

$`100`
[1] "hsa00230" "hsa01100" "hsa05340"

$`1000`
[1] "hsa04514" "hsa05412"

就可以很方便的构建好我们需要的 GeneSetCollection 对象 :

> gsc
GeneSetCollection
 names: 10, 100, ..., 100000110 (6 total)
 unique identifiers: hsa00232, hsa00983, ..., dre04080 (44 total)
 types in collection:
 geneIdType: EntrezIdentifier (1 total)
 collectionType: KEGGCollection (1 total)

有意思的是,作者这里举例,居然是把基因作为基因集,但是kegg的pathway当做是基因来处理,不过问题不大哦。

那么我们自己创造

首先需要自己拿到感兴趣的基因集,我这里呢,用线粒体核糖体基因举例:

library(org.Hs.eg.db)
s=unique(toTable(org.Hs.egSYMBOL)[,2])
head(s)
mt.genes <- s[grep("^MT",s)];
head(mt.genes) # 这里有问题
rb.genes <- s[grep("^RP[SL]",s)];
head(rb.genes)
lst=list(mt.genes=mt.genes,
 rb.genes=rb.genes )
gsc <- GeneSetCollection(mapply(function(geneIds, keggId) {
 GeneSet(geneIds, geneIdType=EntrezIdentifier(),
 collectionType=KEGGCollection(keggId),
 setName=keggId)
}, lst, names(lst)))
gsc

得到如下所示的:

> gsc
GeneSetCollection
 names: mt.genes, rb.genes (2 total)
 unique identifiers: MTOR, MT1A, ..., RPS6KB2-AS1 (2666 total)
 types in collection:
 geneIdType: EntrezIdentifier (1 total)
 collectionType: KEGGCollection (1 total)

文末友情推荐

Comments are closed.