针对小鼠的gskb基因集数据库

Gene Set Knowledgebase (GSKB),完全借鉴于GSEA算法的MSigDB (molecular signature database),数据库,同样是大名鼎鼎的broad开发,也是分成7类:

  • Gene Ontology
  • Curated pathways
  • Metabolic Pathways
  • Transcription Factor (TF)
  • microRNA target genes,
  • location (cytogenetics band)
  • others

收集整理了来自于40余个不同的知识数据库,得到了 33,261 个基因集。

安装gskb这个R包

安装并且查看 PDF教程:

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("gskb")
library(gskb)
browseVignettes("gskb")  
biocLite("PGSEA")

最新版教程: https://bioconductor.org/packages/release/data/experiment/html/gskb.html

查看内置数据集

数据集分成7个,可以分别查看:

library(gskb) 
data(mm_miRNA)
mm_miRNA[[1]][1:10]
mm_GO   Gene Ontology Data for Mouse
mm_location Chromosomal Location Data for Mouse
mm_metabolic    Metabolic Pathways Data for Mouse
mm_miRNA    miRNA Target Genes Data for Mouse
mm_other    Other Data for Mouse
mm_pathway  Pathway Data for Mouse
mm_TF   Transcription Factor Target Genes Data for Mouse

存储在该包的基因集格式是:

 [1] "MIRNA_MM_BETEL_MMU-LET-7A"                                                                        
 [2] "BETEL_MMU-LET-7A; Good mirSVR score Conserved; The microRNA.org resource: targets and expression."
 [3] "NSUN4"                                                                                            
 [4] "DCX"                                                                                              
 [5] "KCNK6"                                                                                            
 [6] "PBX1"                                                                                             
 [7] "PHF8"                                                                                             
 [8] "RACGAP1"                                                                                          
 [9] "EFHD2"                                                                                            
[10] "DCBLD2"

可以看到前两个元素其实并不是基因,需要额外注意哦。

基因集的差异分析

library(PGSEA)
library(gskb)
data(mm_miRNA)
gse<-read.csv("http://ge-lab.org/gskb/GSE40261.csv",header=TRUE, row.name=1)
# Gene are centered by mean expression
gse <- gse - apply(gse,1,mean)  
​
pg <- PGSEA(gse, cl=mm_miRNA, range=c(15,2000), p.value=NA)
# Remove pathways that has all NAs. This could be due to that pathway has 
# too few matching genes. 
pg2 <- pg[rowSums(is.na(pg))!= dim(gse)[2], ]
# 数据集内置的是1868个基因集,剩下 1668个。
# Difference in Average Z score in two groups of samples is calculated and 
# the pathways are ranked by absolute value.
diff <- abs( apply(pg2[,1:4],1,mean) - apply(pg2[,5:8], 1, mean) )
pg2 <- pg2[order(-diff), ]  
​
sub <- factor( c( rep("Control",4),rep("Anti-miR-29",4) ) ) 
smcPlot(pg2[1:15,],sub,scale=c(-12,12),show.grid=TRUE,margins=c(1,1,7,19),col=.rwb)

数据集来源于: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40261 发表于2012年,关于 Hepatic gene expression changes following antisense oligonucleotide-based inhibition of miR-29a

这个表达矩阵的样本是:

> colnames(gse)
[1] "GSM989360_Control1"         "GSM989361__Control2"        "GSM989362_Control3"        
[4] "GSM989363__Control4"        "GSM989364_Anti.miR.29_rep1" "GSM989365_Anti.miR.29_rep1"
[7] "GSM989366_Anti.miR.29_rep3" "GSM989367_Anti.miR.29_rep4"

有了每个基因集在每个样本的打分,以及样本的描述信息,就可以自由的做下游分析了。

Comments are closed.