学习使用`genefu`这个包，首先需要安装它！

source("http://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("genefu")

教程略微有点复杂：https://rdrr.io/bioc/genefu/f/inst/doc/genefu.pdf

这个包自带了5个乳腺癌芯片数据集，需要了解他们，可以看我在生信技能树发的教程：https://vip.biotrainee.com/d/689-5

包里面自带的数据集也非常多，可以在https://rdrr.io/bioc/genefu/man/ 查看列表，比较重要的如下：

gene70	Function to compute the 70 genes prognosis profile (GENE70)…
gene76	Function to compute the Relapse Score as published by Wang et…

mod1	Gene modules published in Desmedt et al. 2008
mod2	Gene modules published in Wirapati et al. 2008

scmgene.robust	Subtype Clustering Model using only ESR1, ERBB2 and AURKA…
scmod1.robust	Subtype Clustering Model using ESR1, ERBB2 and AURKA modules…
scmod2.robust	Subtype Clustering Model using ESR1, ERBB2 and AURKA modules…

sig.endoPredict	Signature used to compute the endoPredict signature as…
sig.gene70	Signature used to compute the 70 genes prognosis profile…
sig.gene76	Signature used to compute the Relapse Score (GENE76) as…
sig.genius	Gene Expression progNostic Index Using Subtypes (GENIUS) as…
sig.ggi	Gene expression Grade Index (GGI) as published in Sotiriou et…
sig.oncotypedx	Signature used to compute the OncotypeDX signature as…

pam50	PAM50 classifier for identification of breast cancer…
pik3cags	Function to compute the PIK3CA gene signature (PIK3CA-GS)

上面列出的这些数据集都是可以打开看的：

library(genefu)
data(pam50)
data(pam50.scale)
data(pam50.robust)
data(scmod2.robust)
pam50
str(scmod2.robust, max.level=1)

最重要的功能就是根据已知基因集来对乳腺癌进行分子分型

所有的分型都是用molecular.subtyping函数，预背了很多可以进行乳腺癌进行分子分型的基因集，比如大名鼎鼎的PAM50，下面是演示：


rm(list = ls())
library(breastCancerMAINZ)
library(breastCancerTRANSBIG)
library(breastCancerUPP)
library(breastCancerUNT)
library(breastCancerNKI)
data(transbig)
dd=transbig
cinfo <- colnames(pData(dd))
cinfo
ddata <- t(exprs(dd))
dannot <- featureData(dd)@data
ddemo <- phenoData(dd)@data

library(genefu)
SubtypePredictions<-molecular.subtyping(sbt.model = "scmod2",data = ddata,
 annot = dannot,do.mapping = TRUE)
table(SubtypePredictions$subtype)
ddemo$SCMOD2<-SubtypePredictions$subtype

PAM50Preds<-molecular.subtyping(sbt.model = "pam50",data=ddata,
 annot=dannot,do.mapping=TRUE)

table(PAM50Preds$subtype)
ddemo$PAM50<-PAM50Preds$subtype

分类结果如下：

> table(SubtypePredictions$subtype)

ER+/HER2- High Prolif ER+/HER2- Low Prolif ER-/HER2- HER2+ 
 50 77 49 22

> table(PAM50Preds$subtype)

Basal Her2 LumB LumA Normal 
 45 26 45 78 4

需要懂这个数据集才能理解，还需要有一点乳腺癌研究背景知识。

成功分类后的信息，就可以用来做生存分析

# http://www.inside-r.org/r-doc/survival/survfit.coxph
library(survival) 
data.for.survival.SCMOD2 <- ddemo[,c("e.os", "t.os", "SCMOD2","age")]
data.for.survival.PAM50 <- ddemo[,c("e.os", "t.os", "PAM50","age")]
# Remove patients with missing survival information
data.for.survival.SCMOD2 <- data.for.survival.SCMOD2[complete.cases(data.for.survival.SCMOD2),]
data.for.survival.PAM50 <- data.for.survival.PAM50[complete.cases(data.for.survival.PAM50),]
days.per.month <- 30.4368
days.per.year <- 365.242
data.for.survival.PAM50$months_to_death <- data.for.survival.PAM50$t.os / days.per.month
data.for.survival.PAM50$vital_status <- data.for.survival.PAM50$e.os == "1"
surv.obj.PAM50 <- survfit(Surv(data.for.survival.PAM50$months_to_death,
 data.for.survival.PAM50$vital_status) ~ data.for.survival.PAM50$PAM50)
data.for.survival.SCMOD2$months_to_death <- data.for.survival.SCMOD2$t.os / days.per.month
data.for.survival.SCMOD2$vital_status <- data.for.survival.SCMOD2$e.os == "1"
surv.obj.SCMOD2 <- survfit(Surv(
 data.for.survival.SCMOD2$months_to_death,
 data.for.survival.SCMOD2$vital_status) ~ data.for.survival.SCMOD2$SCMOD2)
message("KAPLAN-MEIR CURVE - USING PAM50")
plot(main = "Surival Curves SCMOD2", surv.obj.SCMOD2,
 col =c("#006d2c", "#8856a7","#a50f15", "#08519c"),lty = 1,lwd = 3,
 xlab = "Time (months)",ylab = "Probability of Survival") 
legend("topright",fill = c("#006d2c", "#8856a7","#a50f15", "#08519c"), 
 legend = c("Basal","Her2","LumA","LumB"),bty = "n")

plot(main = "Surival Curves PAM50", surv.obj.PAM50,
 col =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),lty = 1,lwd = 3,

 xlab = "Time (months)",ylab = "Probability of Survival")
legend("topright",
 fill = c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),
 legend = c("Basal","Her2","LumA","LumB","Normal"),bty = "n")

也可以把这两个分类画在一张图上，或者添加统计学显著指标，你们自己尝试一下哦，我在生信技能树论坛也多次提到过，就不再赘述咯。

提示一下，可以使用 Cross-validated Partial Likelihood (cvpl) 模型来检验两个分类方法的预后判断情况。

一	二	三	四	五	六	日
« 九
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

使用R包genefu来根据基因集进行表达谱分类

学习使用`genefu`这个包，首先需要安装它！

最重要的功能就是根据已知基因集来对乳腺癌进行分子分型

更多分类标准

学习使用genefu这个包，首先需要安装它！

最重要的功能就是根据已知基因集来对乳腺癌进行分子分型

更多分类标准

学习使用`genefu`这个包，首先需要安装它！