<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 芯片数据</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e8%8a%af%e7%89%87%e6%95%b0%e6%8d%ae/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用oligo包来读取affymetix的基因表达芯片数据-CEL格式数据</title>
		<link>http://www.bio-info-trainee.com/1586.html</link>
		<comments>http://www.bio-info-trainee.com/1586.html#comments</comments>
		<pubDate>Sat, 23 Apr 2016 14:58:31 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[affymetrix]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[oligo]]></category>
		<category><![CDATA[芯片数据]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1586</guid>
		<description><![CDATA[前面讲到affy处理的芯片平台是有限的，一般是hgu 95系列和133系列，[H &#8230; <a href="http://www.bio-info-trainee.com/1586.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>前面讲到affy处理的芯片平台是有限的，一般是hgu 95系列和133系列，[HuGene-1_1-st] Affymetrix Human Gene 1.1 ST Array这个平台虽然也是affymetrix公司的，但是affy包就无法处理 了，这时候就需要oligo包了！</p>
<p>oligo包是R语言的bioconductor系列包的一个，就一个功能，读取affymetix的基因表达芯片数据-CEL格式数据，处理成表达矩阵！！！</p>
<p><span id="more-1586"></span></p>
<p>同理，我们也是要下载原始数据：一个例子：<a href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48452/suppl/GSE48452_RAW.tar">GSE48452</a></p>
<p>下载之后，解压到指定目录，就可以直接用oligo包啦！</p>
<blockquote>
<div>geneCELs=list.celfiles('<span style="color: #ff0000;"><strong>/path/GSE48452/cel_files/</strong></span>',listGzipped=T,<a href="http://full.name">full.name</a>=T)</div>
<div>#用全路径，一般cel文件也是压缩包形式，没必要解压</div>
<div>affyGeneFS &lt;- read.celfiles(geneCELs)  ##读取ｃｅｌ文件</div>
<div>geneCore &lt;- rma(affyGeneFS, target = "core")　 ##这一步是normalization，会比较耗时</div>
<div>genePS &lt;- rma(affyGeneFS, target = "probeset")</div>
<div>#两种normlization的方法，##一般我们会选择transcript相关的</div>
<div>## 这个芯片平台还需要自己把探针ID赋值给表达矩阵</div>
<div>featureData(genePS) &lt;- getNetAffx(genePS, "probeset")</div>
<div>featureData(geneCore) &lt;- getNetAffx(geneCore, "transcript")</div>
<div>## 探针ID还需要注释到基因ID，这里就不讲了！</div>
</blockquote>
<p>处理之后得到的表达矩阵应该是与GEO官网的一致，大家可以自己对照检查一下：</p>
<p>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE48nnn/GSE48452/matrix/GSE48452_series_matrix.txt.gz</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1586.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用affy包读取affymetix的基因表达芯片数据-CEL格式数据</title>
		<link>http://www.bio-info-trainee.com/1580.html</link>
		<comments>http://www.bio-info-trainee.com/1580.html#comments</comments>
		<pubDate>Sat, 23 Apr 2016 14:50:46 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[affy]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[芯片数据]]></category>
		<category><![CDATA[表达矩阵]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1580</guid>
		<description><![CDATA[Affymetrix的探针（proble）一般是长为25碱基的寡聚核苷酸；探针总 &#8230; <a href="http://www.bio-info-trainee.com/1580.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Affymetrix的探针（proble）一般是长为25碱基的寡聚核苷酸；探针总是以perfect match 和mismatch成对出现，其信号值称为PM和MM，成对的perfect match 和mismatch有一个共同的affyID。<br />
CEL文件：信号值和定位信息。<br />
CDF文件：探针对在芯片上的定位信息</p>
<p>affy包是R语言的bioconductor系列包的一个，就一个功能，读取affymetix的基因表达芯片数据-CEL格式数据，处理成表达矩阵！！！</p>
<p><span id="more-1580"></span></p>
<p>一般我们都是去GEO数据库里面知道找到CEL文件的下载地址~~~比如<a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1428">GSE1438</a>，测了10 young (19-25 years old) and 12 older (70-80 years old) male的样品，然后找差异基因，从GEO数据库我们找到cel文件下载地址是：</p>
<p>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1428/suppl/<span style="color: #ff0000;">GSE1428_RAW.tar</span></p>
<p>我们是为了讲解affy才下载原始数据的，其实GEO也提供处理好的表达矩阵供下载</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/1.png"><img class="alignnone size-full wp-image-1581" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/1.png" alt="1" width="290" height="201" /></a></p>
<p>下载后压缩到指定目录即可</p>
<p>下载到本地之后就可以用代码读取它了！</p>
<blockquote><p>library(affy)<br />
dir_cels='D:\\test_analysis\\TNBC\\cel_files'<br />
affy_data = ReadAffy(celfile.path=dir_cels)<br />
eset.mas5 = mas5(affy_data)</p></blockquote>
<p><!--StartFragment --></p>
<div>读取的过程还是蛮耗时间的，<span style="color: #ff0000;">也可以选择rma函数而不是mas5函数对表达数据进行normalization</span></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/2.png"><img class="alignnone size-full wp-image-1582" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/2.png" alt="2" width="449" height="251" /></a></div>
<div>读取之后的表达矩阵如图所示：</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/3.png"><img class="alignnone size-full wp-image-1583" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/3.png" alt="3" width="727" height="318" /></a></div>
<div>理论上，处理得到的数据应该与直接在GEO官网下载的表达量是一样的，下载链接都是有规律的！</div>
<p>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1428/matrix/<span style="color: #ff0000;">GSE1428_series_matrix.txt.gz</span></p>
<p>当然这个affy包支持的芯片平台是有限的！</p>
<p>一般是hgu 95系列和133系列~~</p>
<p>其实严格来说，这个芯片得到的表达矩阵，是需要过滤的。</p>
<p>比如像下面的代码：</p>
<p>setwd('../')<br />
library(affy)<br />
dir_cels='GSE34824_RAW'<br />
data &lt;- ReadAffy(celfile.path=dir_cels)<br />
eset &lt;- rma(data)<br />
calls &lt;- mas5calls(data) # get PMA calls<br />
calls &lt;- exprs(calls)<br />
absent &lt;- rowSums(calls == 'A') # how may samples are each gene 'absent' in all samples<br />
absent &lt;- which (absent == ncol(calls)) # which genes are 'absent' in all samples<br />
rmaFiltered &lt;- eset[-absent,] # filters out the genes 'absent' in all samples</p>
<p>54675 features 经过过滤后，剩下 42482 features</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1580.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>一个表达芯片数据处理实例</title>
		<link>http://www.bio-info-trainee.com/1024.html</link>
		<comments>http://www.bio-info-trainee.com/1024.html#comments</comments>
		<pubDate>Fri, 25 Sep 2015 14:53:39 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[R分享]]></category>
		<category><![CDATA[实战]]></category>
		<category><![CDATA[流程]]></category>
		<category><![CDATA[芯片数据]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1024</guid>
		<description><![CDATA[这个实例上部分包括： 如何用R包下载GEO数据(只限单一平台，其余平台需要修改下 &#8230; <a href="http://www.bio-info-trainee.com/1024.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>这个实例上部分包括：</div>
<div>如何用R包下载GEO数据(只限单一平台，其余平台需要修改下面的代码)</div>
<div>如何对GEO的芯片数据归一化并且得到表达量矩阵，</div>
<div>如何用limma包做差异分析，</div>
<div>对找到的差异基因如何做GO和KEGG注释</div>
<p><span id="more-1024"></span></p>
<div></div>
<div>首先下载两个GEO数据：</div>
<div>平台是：Affymetrix U133 gene chips</div>
<div>67 diseased triple negative breast cancer<span class="Apple-converted-space"> </span>samples（GSE31519<span class="Apple-converted-space"> </span>）and 42 control samples (GSE20437）</div>
<div>都是表达量数据，同一种芯片。分成两组，正好做差异表达。</div>
<div>数据来源的文献是：</div>
<div>文章title：A clinically relevant gene signature in triple negative and basal-like breast cancer</div>
<div>结论（We describe a ratio of high B-cell presence and low IL-8 activity as a powerful new prognostic marker for TNBC. ）</div>
<div>地址：<a href="http://www.breast-cancer-research.com/content/13/5/R97">http://www.breast-cancer-research.com/content/13/5/R97</a></div>
<div><span style="font-size: medium;">GEO数据地址: </span><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31519">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31519</a></div>
<div><span style="font-size: medium;">Platform: GPL96 67 Samples</span></div>
<div><span style="font-size: medium;">Download data: GEO (CEL, TXT)</span></div>
<div><span style="font-size: medium;">SeriesAccession: GSE31519ID: 200031519</span></div>
<div></div>
<div>文章title：Histologically normal epithelium from breast cancer patients and cancer-free prophylactic mastectomy patient</p>
<div><span style="font-size: medium;">GEO数据地址:  </span><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20437">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20437</a></div>
<div><span style="font-size: medium;">Platform: GPL96 67 Samples</span></div>
<div><span style="font-size: medium;">Download data: GEO (CEL, TXT)</span></div>
<div><span style="font-size: medium;">SeriesAccession: GSE31519  ID: 200031519</span></div>
<p>Platform: GPL96 Series: GSE20437   42 Samples</p>
</div>
<div>Download data: GEO (CEL)</div>
<div>DataSetAccession: GDS3716  ID: 3716</div>
<div></div>
<div></div>
<div>我首先用R的GEOquery包来下载。(其实你完全可以直接去GEO网站下载数据，然后解压的)</div>
<div>suppressMessages(library(GEOquery))</div>
<div>setwd("D:\\test_analysis\\TNBC")</div>
<div>gse31519=getGEO("GSE31519",GSEMatrix = T,destdir = "./")</div>
<div>getGEOSuppFiles("GSE31519",baseDir = "./")</div>
<div>gse31519=getGEO("GSE20437",GSEMatrix = T,destdir = "./")</div>
<div>getGEOSuppFiles("GSE20437",baseDir = "./")</div>
<div></div>
<div><span style="color: #ff0000;"><strong>这样下载之后的数据都存在D:\\test_analysis\\TNBC下面</strong></span></div>
<div><img src="file:///C:/Users/jmzeng/AppData/Local/YNote/data/jmzeng1314@163.com/8467bf2015714848812f3bf1a85e8adc/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="CFEEEEB609C645C58285EE460BAFD8AE" data-attr-org-img-file="file:///C:/Users/jmzeng/AppData/Local/YNote/data/jmzeng1314@163.com/8467bf2015714848812f3bf1a85e8adc/clipboard.png" /></div>
<div>接下来我们就用affy包和limma来进行差异分析：</div>
<div></div>
<div><span style="font-size: medium;">library(affy)</span></div>
<div><span style="font-size: medium;">library(limma)</span></div>
<div><span style="font-size: medium;">affy.data=ReadAffy(celfile.path="<span style="color: #ff0000;">./cel_files</span>")</span></div>
<div>请先搞清楚，ReadAffy 这个函数的用法！</div>
<div><!--StartFragment --></p>
<div><span style="color: #ff0000;">当前工作目录下面有没有cel_files文件夹?</span></div>
<div><span style="color: #ff0000;"><!--StartFragment --></span></p>
<div><span style="color: #ff0000;">cel_files文件夹下面有没有文件？</span></div>
</div>
</div>
<div><span style="font-size: medium;">eset.rma=rma(affy.data)</span></div>
<div><span style="font-size: medium;">exprSet=exprs(eset.rma)</span></div>
<div><span style="font-size: medium;">write.table(exprSet,"expr_rma_matrix.txt",quote=F,sep="\t")</span></div>
<div><span style="font-size: medium;">group=factor(c(rep("control",42),rep("case",67)))</span></div>
<div><span style="font-size: medium;">design = model.matrix(~0+group)</span></div>
<div><span style="font-size: medium;">colnames(design)=c("case","control")</span></div>
<div><span style="font-size: medium;">rownames(design)=sampleNames(affy.data)</span></div>
<div><span style="font-size: medium;">fit=lmFit(exprSet,design)</span></div>
<div><span style="font-size: medium;">cont.matrix = makeContrasts(contrasts="case-control",levels=design)</span></div>
<div><span style="font-size: medium;">fit2=contrasts.fit(fit,cont.matrix)</span></div>
<div><span style="font-size: medium;">fit2=eBayes(fit2)</span></div>
<div><span style="font-size: medium;">diff_dat=topTable(fit2,coef=1,n=Inf)</span></div>
<div><span style="font-size: medium;">write.table(diff_dat,"diff_dat.txt",quote=F)</span></div>
<div><span style="font-size: medium;"> </span></div>
<div><span style="font-size: medium;">这样得到的diff_dat就是我们差异分析的结果啦</span></div>
<div>we choose the log fold cut off<span class="Apple-converted-space"> </span>change to be “2” to get a manageable set of genes.<br />
原文说：we were able to get a list of<span class="Apple-converted-space"> </span>2567 genes after removing the duplicates and the not available genes</div>
<div>我们仅仅根据一个标准来挑选差异基因， the log fold cut off change to be “2”，我只挑出来了782个探针</div>
<div><img src="file:///C:/Users/jmzeng/AppData/Local/YNote/data/jmzeng1314@163.com/a67937bb6442411aa42d9efed6c4d6c1/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="0E1CE3D83D654D50B02E66C9C28BF001" data-attr-org-img-file="file:///C:/Users/jmzeng/AppData/Local/YNote/data/jmzeng1314@163.com/a67937bb6442411aa42d9efed6c4d6c1/clipboard.png" /><br />
接下来对这些探针进行注释，得到基因名，我这里用biomart包来进行注释</div>
<div><img src="file:///C:/Users/jmzeng/AppData/Local/YNote/data/jmzeng1314@163.com/85a569a5e423450e8473ce80e380ada1/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="353A8AA43CA243A99F80CE70E99CB053" data-attr-org-img-file="file:///C:/Users/jmzeng/AppData/Local/YNote/data/jmzeng1314@163.com/85a569a5e423450e8473ce80e380ada1/clipboard.png" /></div>
<div>我们的平台是：Affymetrix U133 gene chips，虽然有22283个探针，但是只有13908个基因</div>
<div>所以代码如下：</div>
<div>ensembl=useMart("ensembl",dataset="hsapiens_gene_ensembl")</div>
<div>gene_probe=getBM(attributes=c("hgnc_symbol","affy_hg_u133a"),filter="affy_hg_u133a",value=rownames(diff_dat),mart=ensembl)</div>
<div>diff_probe=rownames(diff_dat[abs(diff_dat[,1])&gt;2,])</div>
<div>diff_gene=gene_probe[match(diff_probe,gene_probe[,2]),1]</div>
<div>diff_gene=na.omit(diff_gene)</div>
<div>diff_gene=unique(diff_gene)</div>
<div>length(diff_gene)</div>
<div>这样会得到604个差异基因</div>
<div>然后我做一下GO和KEGG的富集分析</div>
<div>gene_entrez=getBM(attributes=c("hgnc_symbol","entrezgene"),filter="hgnc_symbol",value=diff_gene,mart=ensembl)</div>
<div>require(DOSE)</div>
<div>require(clusterProfiler)</div>
<div>gene_entrez=na.omit(gene_entrez)</div>
<div>gene=as.character(gene_entrez[,2])</div>
<div>ego &lt;- enrichGO(gene=gene,organism="human",ont="CC",pvalueCutoff=0.01,readable=TRUE)</div>
<div>ekk &lt;- enrichKEGG(gene=gene,organism="human",pvalueCutoff=0.01,readable=TRUE)</div>
<div>write.csv(summary(ekk),"KEGG-enrich.csv",row.names =F)</div>
<div>write.csv(summary(ego),GO-enrich.csv,row.names =F)</div>
<p>懒得上传图片了，大家可以用同样的代码自己实现所有的流程</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1024.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
