使用R包genefu来根据基因集进行表达谱分类

学习使用genefu这个包,首先需要安装它!

source("http://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("genefu")

教程略微有点复杂:https://rdrr.io/bioc/genefu/f/inst/doc/genefu.pdf

这个包自带了5个乳腺癌芯片数据集,需要了解他们,可以看我在生信技能树发的教程:https://vip.biotrainee.com/d/689-5

包里面自带的数据集也非常多,可以在https://rdrr.io/bioc/genefu/man/ 查看列表,比较重要的如下:

gene70 Function to compute the 70 genes prognosis profile (GENE70)…
gene76 Function to compute the Relapse Score as published by Wang et…
mod1 Gene modules published in Desmedt et al. 2008
mod2 Gene modules published in Wirapati et al. 2008
scmgene.robust Subtype Clustering Model using only ESR1, ERBB2 and AURKA…
scmod1.robust Subtype Clustering Model using ESR1, ERBB2 and AURKA modules…
scmod2.robust Subtype Clustering Model using ESR1, ERBB2 and AURKA modules…
sig.endoPredict Signature used to compute the endoPredict signature as…
sig.gene70 Signature used to compute the 70 genes prognosis profile…
sig.gene76 Signature used to compute the Relapse Score (GENE76) as…
sig.genius Gene Expression progNostic Index Using Subtypes (GENIUS) as…
sig.ggi Gene expression Grade Index (GGI) as published in Sotiriou et…
sig.oncotypedx Signature used to compute the OncotypeDX signature as…
pam50 PAM50 classifier for identification of breast cancer…
pik3cags Function to compute the PIK3CA gene signature (PIK3CA-GS)

上面列出的这些数据集都是可以打开看的:

library(genefu)
data(pam50)
data(pam50.scale)
data(pam50.robust)
data(scmod2.robust)
pam50
str(scmod2.robust, max.level=1)

最重要的功能就是根据已知基因集来对乳腺癌进行分子分型

所有的分型都是用molecular.subtyping函数,预背了很多可以进行乳腺癌进行分子分型的基因集,比如大名鼎鼎的PAM50,下面是演示:


rm(list = ls())
library(breastCancerMAINZ)
library(breastCancerTRANSBIG)
library(breastCancerUPP)
library(breastCancerUNT)
library(breastCancerNKI)
data(transbig)
dd=transbig
cinfo <- colnames(pData(dd))
cinfo
ddata <- t(exprs(dd))
dannot <- featureData(dd)@data
ddemo <- phenoData(dd)@data

library(genefu)
SubtypePredictions<-molecular.subtyping(sbt.model = "scmod2",data = ddata,
 annot = dannot,do.mapping = TRUE)
table(SubtypePredictions$subtype)
ddemo$SCMOD2<-SubtypePredictions$subtype

PAM50Preds<-molecular.subtyping(sbt.model = "pam50",data=ddata,
 annot=dannot,do.mapping=TRUE)

table(PAM50Preds$subtype)
ddemo$PAM50<-PAM50Preds$subtype

分类结果如下:

> table(SubtypePredictions$subtype)

ER+/HER2- High Prolif ER+/HER2- Low Prolif ER-/HER2- HER2+ 
 50 77 49 22

> table(PAM50Preds$subtype)

Basal Her2 LumB LumA Normal 
 45 26 45 78 4

需要懂这个数据集才能理解,还需要有一点乳腺癌研究背景知识。

成功分类后的信息,就可以用来做生存分析

# http://www.inside-r.org/r-doc/survival/survfit.coxph
library(survival) 
data.for.survival.SCMOD2 <- ddemo[,c("e.os", "t.os", "SCMOD2","age")]
data.for.survival.PAM50 <- ddemo[,c("e.os", "t.os", "PAM50","age")]
# Remove patients with missing survival information
data.for.survival.SCMOD2 <- data.for.survival.SCMOD2[complete.cases(data.for.survival.SCMOD2),]
data.for.survival.PAM50 <- data.for.survival.PAM50[complete.cases(data.for.survival.PAM50),]
days.per.month <- 30.4368
days.per.year <- 365.242
data.for.survival.PAM50$months_to_death <- data.for.survival.PAM50$t.os / days.per.month
data.for.survival.PAM50$vital_status <- data.for.survival.PAM50$e.os == "1"
surv.obj.PAM50 <- survfit(Surv(data.for.survival.PAM50$months_to_death,
 data.for.survival.PAM50$vital_status) ~ data.for.survival.PAM50$PAM50)
data.for.survival.SCMOD2$months_to_death <- data.for.survival.SCMOD2$t.os / days.per.month
data.for.survival.SCMOD2$vital_status <- data.for.survival.SCMOD2$e.os == "1"
surv.obj.SCMOD2 <- survfit(Surv(
 data.for.survival.SCMOD2$months_to_death,
 data.for.survival.SCMOD2$vital_status) ~ data.for.survival.SCMOD2$SCMOD2)
message("KAPLAN-MEIR CURVE - USING PAM50")
plot(main = "Surival Curves SCMOD2", surv.obj.SCMOD2,
 col =c("#006d2c", "#8856a7","#a50f15", "#08519c"),lty = 1,lwd = 3,
 xlab = "Time (months)",ylab = "Probability of Survival") 
legend("topright",fill = c("#006d2c", "#8856a7","#a50f15", "#08519c"), 
 legend = c("Basal","Her2","LumA","LumB"),bty = "n")

plot(main = "Surival Curves PAM50", surv.obj.PAM50,
 col =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),lty = 1,lwd = 3,

 xlab = "Time (months)",ylab = "Probability of Survival")
legend("topright",
 fill = c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),
 legend = c("Basal","Her2","LumA","LumB","Normal"),bty = "n")

也可以把这两个分类画在一张图上,或者添加统计学显著指标,你们自己尝试一下哦,我在生信技能树论坛也多次提到过,就不再赘述咯。

genefu-surival-pam50-transbig genefu-surival-scmod2-transbig

提示一下,可以使用 Cross-validated Partial Likelihood (cvpl) 模型来检验两个分类方法的预后判断情况。

更多分类标准

前面我们提到过,这个包最大的优点就是内置了一系列分类指标,如下;

Subtype Clustering Model using just the AURKA gene: scmgene.robust()
Subtype Clustering Model using just the ESR1 gene: scmgene.robust() Subtype Clustering Model using just the ERBB2 gene: scmgene.robust() 
NPI: npi()
GGI: ggi()
GENIUS: genius() 
EndoPredict: endoPredict() 
OncotypeDx: oncotypedx() 
TamR: tamr()
GENE70: gene70() 
PIK3CA: pik3cags()
rorS: rorS()

基本上是,可以想怎么分类,就怎么分类了。

genefu-classification


 

Comments are closed.