使用PGSEA包进行基因集分析

GSEA 相信看过我生信菜鸟团博客的朋友都已经耳熟能详了的，其需要样本的描述以及分组信息，虽然有ssGSEA这样的单样本的分析，但仍然不够，也有GSVA这样的算法来弥补，这里要介绍的是另外一个包，PGSEA。

it tests for each sample whether the average expression of genes in a gene sets deviates from the overall average expression (expression of all genes in all samples).

安装PGSEA这个R包

安装并且查看 PDF教程：

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("PGSEA")
library(PGSEA)
browseVignettes("PGSEA")

重点就是PGSEA函数及smcPlot函数，前者根据表达矩阵及基因集来进行GSEA分析，后者用来可视化分析后的结果。

因为PGSEA分析后的结果是每个基因集在每个样本的一个score，所以也是一个表达矩阵，也可以进行limma的差异分析流程。

执行PGSEA分析及可视化

library(PGSEA)
library(gskb)
data(mm_miRNA)
gse<-read.csv("http://ge-lab.org/gskb/GSE40261.csv",header=TRUE, row.name=1)
# Gene are centered by mean expression
gse <- gse - apply(gse,1,mean)  

## 保证表达矩阵和基因集的基因名字是对应的即可。

d1 <- scan("http://ge-lab.org/gskb/2-MousePath/MousePath_Co-expression_gmt.gmt", what="", sep="\n", skip=1)
mm_Co_expression <- strsplit(d1, "\t")
names(mm_Co_expression) <- sapply(mm_Co_expression, '[[', 1)

pg <- PGSEA(gse, cl=mm_Co_expression, range=c(15,2000), p.value=NA)

# Remove pathways that has all NAs. This could be due to that pathway has
# too few matching genes. 
pg2 <- pg[rowSums(is.na(pg))!= dim(gse)[2], ]

# Difference in Average Z score in two groups of samples is calculated and 
# the pathways are ranked by absolute value.
diff <- abs( apply(pg2[,1:4],1,mean) - apply(pg2[,5:8], 1, mean) )
pg2 <- pg2[order(-diff), ]  

sub <- factor( c( rep("Control",4),rep("Anti-miR-29",4) ) ) 

smcPlot(pg2[1:15,],sub,scale=c(-12,12),show.grid=TRUE,margins=c(1,1,7,19),col=.rwb)

分析结果也可以走limma流程

library(PGSEA)
library(GEOquery)
library(GSEABase)
gse <- getGEO("GSE7023",GSEMatrix=TRUE)
#load("gse.rda")
subtype <- gsub("\\.", "_",gsub("subtype: ", "", phenoData(gse[[1]])$"characteristics_ch1"))
pheno <- new("AnnotatedDataFrame", data = data.frame(subtype), varMetadata = data.frame(labelDescription="subtype"))
rownames(pheno@data) <- colnames(exprs(gse[[1]]))
eset <- new("ExpressionSet", exprs = exprs(gse[[1]]), phenoData = pheno)

data(VAIgsc)
details(VAIgsc[[1]])

pgNF <- PGSEA(eset, VAIgsc, ref=which(subtype=="NO"), p.value=NA)

library(limma)

design <- model.matrix(~ -1+factor(subtype))
colnames(design) <- names(table(subtype))
fit <- lmFit(pgNF, design)
contrast.matrix <- makeContrasts(P2B-NO , levels=design)
fit <- contrasts.fit(fit, contrast.matrix)
fit <- eBayes(fit)

topTable(fit, n=10)[,c("logFC","t","adj.P.Val")]

一	二	三	四	五	六	日
« 九
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31