用R获取芯片探针与基因的对应关系三部曲-NCBI下载对应关系

这是系列文章,请先看:

用R获取芯片探针与基因的对应关系三部曲-bioconductor

ncbi现有的GPL已经过万了,但是bioconductor的芯片注释包不到一千,虽然bioconductor可以解决我们大部分的需要,比如affymetrix的95,133系列,深圳1.0st系列,HTA2.0系列,但是如果碰到比较生僻的芯片,bioconductor也不会刻意为之制作一个bioconductor的包,这时候就需要自行下载NCBI的GPL信息了,也可以通过R来解决:

##本质上是下载一个文件,读进R里面,然后解析行列式,得到芯片探针与基因的对应关系,看下面的代码,你就能理解了。

## A-AGIL-28 - Agilent Whole Human Genome Microarray 4x44K 014850 G4112F (85 cols x 532 rows)
library(Biobase)
library(GEOquery)
#Download GPL file, put it in the current directory, and load it:
gpl <- getGEO('GPL6480', destdir=".")
colnames(Table(gpl)) ## [1] 41108 17
head(Table(gpl)[,c(1,6,7)]) ## you need to check this , which column do you need
write.csv(Table(gpl)[,c(1,6,7)],"GPL6400.csv")
#platformDB='hgu133plus2.db'
#library(platformDB, character.only=TRUE)
probeset <- featureNames(GSE32575[[1]])
library(Biobase)
library(GEOquery)
#Download GPL file, put it in the current directory, and load it:
gpl <- getGEO('GPL6102', destdir=".")
colnames(Table(gpl)) ## [1] 41108 17
head(Table(gpl)[,c(1,10,13)]) ## you need to check this , which column do you need
probe2symbol=Table(gpl)[,c(1,13)]
## GPL15207 [PrimeView] Affymetrix Human Gene Expression Array
probeset <- featureNames(GSE58979[[1]])
library(Biobase)
library(GEOquery)
#Download GPL file, put it in the current directory, and load it:
gpl <- getGEO('GPL15207', destdir=".")
colnames(Table(gpl)) ## [1] 49395 24
head(Table(gpl)[,c(1,15,19)]) ## you need to check this , which column do you need
probe2symbol=Table(gpl)[,c(1,15)]

## GPL10558 Illumina HumanHT-12 V4.0 expression beadchip
library(Biobase)
library(GEOquery)
#Download GPL file, put it in the current directory, and load it:
gpl <- getGEO('GPL10558', destdir=".")
colnames(Table(gpl)) ## [1] 41108 17
head(Table(gpl)[,c(1,10,13)]) ## you need to check this , which column do you need
probe2symbol=Table(gpl)[,c(1,13)]

 

Comments are closed.