<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 探针</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e6%8e%a2%e9%92%88/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>芯片探针注释基因ID或者symbol，并对每个基因挑选最大表达量探针</title>
		<link>http://www.bio-info-trainee.com/1502.html</link>
		<comments>http://www.bio-info-trainee.com/1502.html#comments</comments>
		<pubDate>Tue, 29 Mar 2016 10:14:06 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[探针]]></category>
		<category><![CDATA[芯片]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1502</guid>
		<description><![CDATA[在R里面实现这个功能其实非常简单，难的是很多packages经常会出现安装问题， &#8230; <a href="http://www.bio-info-trainee.com/1502.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>在R里面实现这个功能其实非常简单，难的是很多packages经常会出现安装问题，更有的人压根不看芯片平台是什么，芯片对应的package是什么，就开始到处发问，自学能力实在是堪忧！</p>
<p>我前面有写目前所有bioconductor支持的芯片平台对应关系：<a title="详细阅读 通过bioconductor包来获取所有的芯片探针与gene的对应关系" href="http://www.bio-info-trainee.com/1399.html" rel="bookmark">通过bioconductor包来获取所有的芯片探针与gene的对应关系</a></p>
<p>但那其实是一个很笨的办法，得到所有的各式各样的探针ID与基因的对应关系，以为它绕路了，正常情况只需要在GEO里面找到芯片对应基因关系即可，没必要下载那么多package的，但是这样做的好处也是很明显的， 对很多初学者来说，如果package能解决的话，就省心很多，比如下面这个转换关系：</p>
<blockquote>
<div>suppressPackageStartupMessages(library(CLL))</div>
<div>## 这个package自带了一个数据，是我们需要用的</div>
<div>data(sCLLex)  ## 这个数据里面有24个样本，分成两组，可以直接拿来测试差异基因分析</div>
<div>library(hgu95av2.db) <strong> ## 一定要搞清楚自己的芯片是什么数据包</strong></div>
<div><a href="http://www.bio-info-trainee.com/1399.html"><strong>## 常见的芯片平台，都是有对应的bioconductor数据包的</strong></a></div>
<div>exprSet=exprs(sCLLex)  ##得到表达数据矩阵，但是矩阵的行名，是探针ID，无法理解，需要转换</div>
<div>##首先你取出所有的探针ID，<span style="color: #ff0000;">#这里可以用三种方法来得到symbol，或者得到entrezID也可以</span></div>
<div>probeset=rownames(exprSet)</div>
<div>Symbol=as.character(as.list(<span style="color: #ff0000;">hgu95av2SYMBOL</span>[probeset]))</div>
<div><span style="color: #ff0000;">#annotate包提供              getSYMBOL( probeset ,"hgu95av2" )</span></div>
<div><span style="color: #ff0000;">#还可以用lookUp函数     lookUp( probeset , "hgu95av2", "SYMBOL")</span></div>
<div><span style="color: #ff0000;">#这些只是技巧而已啦</span></div>
<div>a=cbind.data.frame(Symbol,exprSet)</div>
<div><strong><span style="color: #ff0000;">## 下面这个函数是对每个基因挑选最大表达量探针</span></strong></div>
<div>rmDupID &lt;-function(a=matrix(c(1,1:5,2,2:6,2,3:7),ncol=6)){</div>
<div>  exprSet=a[,-1]</div>
<div>  rowMeans=apply(exprSet,1,function(x) mean(as.numeric(x),na.rm=T))</div>
<div>  a=a[order(rowMeans,decreasing=T),]</div>
<div>  exprSet=a[!duplicated(a[,1]),]</div>
<div>  #exprSet=apply(exprSet,2,as.numeric)</div>
<div>  exprSet=exprSet[!<a href="http://is.na">is.na</a>(exprSet[,1]),]</div>
<div>  rownames(exprSet)=exprSet[,1]</div>
<div>  exprSet=exprSet[,-1]</div>
<div>  return(exprSet)</div>
<div>}</div>
<div>exprSet=rmDupID(a)</div>
</blockquote>
<div><strong>对每个基因挑选最大表达量探针</strong>，只是一种处理方法而已，只是我一般处理芯片是这样做的，并不一定就是最好的！</div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1502.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用R获取芯片探针与基因的对应关系三部曲-bioconductor</title>
		<link>http://www.bio-info-trainee.com/1399.html</link>
		<comments>http://www.bio-info-trainee.com/1399.html#comments</comments>
		<pubDate>Mon, 15 Feb 2016 15:41:55 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[探针]]></category>
		<category><![CDATA[生物芯片]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1399</guid>
		<description><![CDATA[现有的基因芯片种类不要太多了！ 但是重要而且常用的芯片并不多！ 一般分析芯片数据 &#8230; <a href="http://www.bio-info-trainee.com/1399.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>现有的基因芯片种类不要太多了！</p>
<div>但是重要而且常用的芯片并不多！</div>
<div>一般分析芯片数据都需要把探针的ID切换成基因的ID，我一般喜欢用基因的entrez ID。</div>
<div><strong><span style="color: #ff0000;">一般有三种方法可以得到芯片探针与gene的对应关系。</span></strong></div>
<div><strong><span style="color: #ff0000;">金标准当然是去基因芯片的厂商的官网直接去下载啦！！！</span></strong></div>
<div><strong><span style="color: #ff0000;">一种是直接用bioconductor的包</span></strong></p>
<div><strong><span style="color: #ff0000;">一种是从NCBI里面下载文件来解析好！</span></strong></div>
<div>首先，我们说官网，肯定可以找到，不然这种芯片出来就没有意义了！</div>
<div>然后，我们看看NCBI下载的，会比较大</div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6947">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6947</a></div>
<div>这两种方法都比较麻烦，需要一个个的来！</div>
<div>所以我接下来要讲的是用R的bioconductor包来批量得到芯片探针与gene的对应关系！</div>
<div>一般重要的芯片在R的bioconductor里面都是有包的，用一个R包可以批量获取有注释信息的芯片平台，我选取了常见的物种，如下：</div>
<div></div>
<div>
<blockquote>
<div>        gpl           organism                  bioc_package</div>
<div>1     GPL32       Mus musculus                        mgu74a</div>
<div>2     GPL33       Mus musculus                        mgu74b</div>
<div>3     GPL34       Mus musculus                        mgu74c</div>
<div>6     GPL74       Homo sapiens                        hcg110</div>
<div>7     GPL75       Mus musculus                     mu11ksuba</div>
<div>8     GPL76       Mus musculus                     mu11ksubb</div>
<div>9     GPL77       Mus musculus                     mu19ksuba</div>
<div>10    GPL78       Mus musculus                     mu19ksubb</div>
<div>11    GPL79       Mus musculus                     mu19ksubc</div>
<div>12    GPL80       Homo sapiens                        hu6800</div>
<div>13    GPL81       Mus musculus                      mgu74av2</div>
<div>14    GPL82       Mus musculus                      mgu74bv2</div>
<div>15    GPL83       Mus musculus                      mgu74cv2</div>
<div>16    GPL85  Rattus norvegicus                        rgu34a</div>
<div>17    GPL86  Rattus norvegicus                        rgu34b</div>
<div>18    GPL87  Rattus norvegicus                        rgu34c</div>
<div>19    GPL88  Rattus norvegicus                         rnu34</div>
<div>20    GPL89  Rattus norvegicus                         rtu34</div>
<div>22    GPL91       Homo sapiens                      hgu95av2</div>
<div>23    GPL92       Homo sapiens                        hgu95b</div>
<div>24    GPL93       Homo sapiens                        hgu95c</div>
<div>25    GPL94       Homo sapiens                        hgu95d</div>
<div>26    GPL95       Homo sapiens                        hgu95e</div>
<div>27    GPL96       Homo sapiens                       hgu133a</div>
<div>28    GPL97       Homo sapiens                       hgu133b</div>
<div>29    GPL98       Homo sapiens                     hu35ksuba</div>
<div>30    GPL99       Homo sapiens                     hu35ksubb</div>
<div>31   GPL100       Homo sapiens                     hu35ksubc</div>
<div>32   GPL101       Homo sapiens                     hu35ksubd</div>
<div>36   GPL201       Homo sapiens                       hgfocus</div>
<div>37   GPL339       Mus musculus                       moe430a</div>
<div>38   GPL340       Mus musculus                     mouse4302</div>
<div>39   GPL341  Rattus norvegicus                       rae230a</div>
<div>40   GPL342  Rattus norvegicus                       rae230b</div>
<div>41   GPL570       Homo sapiens                   hgu133plus2</div>
<div>42   GPL571       Homo sapiens                      hgu133a2</div>
<div>43   GPL886       Homo sapiens                     hgug4111a</div>
<div>44   GPL887       Homo sapiens                     hgug4110b</div>
<div>45  GPL1261       Mus musculus                    mouse430a2</div>
<div>49  GPL1352       Homo sapiens                       u133x3p</div>
<div>50  GPL1355  Rattus norvegicus                       rat2302</div>
<div>51  GPL1708       Homo sapiens                     hgug4112a</div>
<div>54  GPL2891       Homo sapiens                       h20kcod</div>
<div>55  GPL2898  Rattus norvegicus                     adme16cod</div>
<div>60  GPL3921       Homo sapiens                     hthgu133a</div>
<div>63  GPL4191       Homo sapiens                       h10kcod</div>
<div>64  GPL5689       Homo sapiens                     hgug4100a</div>
<div>65  GPL6097       Homo sapiens               illuminaHumanv1</div>
<div>66  GPL6102       Homo sapiens               illuminaHumanv2</div>
<div>67  GPL6244       Homo sapiens   hugene10sttranscriptcluster</div>
<div>68  GPL6947       Homo sapiens               illuminaHumanv3</div>
<div>69  GPL8300       Homo sapiens                      hgu95av2</div>
<div>70  GPL8490       Homo sapiens   IlluminaHumanMethylation27k</div>
<div>71 GPL10558       Homo sapiens               illuminaHumanv4</div>
<div>72 GPL11532       Homo sapiens   hugene11sttranscriptcluster</div>
<div>73 GPL13497       Homo sapiens         HsAgilentDesign026652</div>
<div>74 GPL13534       Homo sapiens  IlluminaHumanMethylation450k</div>
<div>75 GPL13667       Homo sapiens                        hgu219</div>
<div>76 GPL15380       Homo sapiens      GGHumanMethCancerPanelv1</div>
<div>77 GPL15396       Homo sapiens                     hthgu133b</div>
<div>78 GPL17897       Homo sapiens                     hthgu133a</div>
</blockquote>
</div>
<div>这些包首先需要都下载</div>
<div>
<blockquote>
<div>gpl_info=read.csv("GPL_info.csv",stringsAsFactors = F)</div>
<div>### first download all of the annotation packages from bioconductor</div>
<div>for (i in 1:nrow(gpl_info)){</div>
<div>  print(i)</div>
<div>  platform=gpl_info[i,4]</div>
<div>  platform=gsub('^ ',"",platform) ##主要是因为我处理包的字符串前面有空格</div>
<div>  #platformDB='hgu95av2.db'</div>
<div>  platformDB=paste(platform,".db",sep="")</div>
<div>  if( platformDB  %in% rownames(installed.packages()) == FALSE) {</div>
<div>    BiocInstaller::biocLite(platformDB)</div>
<div>    #source("<a href="http://bioconductor.org/biocLite.R">http://bioconductor.org/biocLite.R</a>");</div>
<div>    #biocLite(platformDB )</div>
<div>  }</div>
<div>}</div>
</blockquote>
</div>
<blockquote>
<div>下载完了所有的包， 就可以进行批量导出芯片探针与gene的对应关系！</div>
</blockquote>
<div>
<blockquote>
<div>for (i in 1:nrow(gpl_info)){</div>
<div>  print(i)</div>
<div>  platform=gpl_info[i,4]</div>
<div>  platform=gsub('^ ',"",platform)</div>
<div>  #platformDB='hgu95av2.db'</div>
<div>  platformDB=paste(platform,".db",sep="")</div>
<div></div>
<div>  if( platformDB  %in% rownames(installed.packages()) != FALSE) {</div>
<div>    library(platformDB,character.only = T)</div>
<div>    #tmp=paste('head(mappedkeys(',platform,'ENTREZID))',sep='')</div>
<div>    #eval(parse(text = tmp))</div>
<div>###重点在这里，把字符串当做命令运行</div>
<div>    all_probe=eval(parse(text = paste('mappedkeys(',platform,'ENTREZID)',sep='')))</div>
<div>    EGID &lt;- as.numeric(lookUp(all_probe, platformDB, "ENTREZID"))</div>
<div>##自己把内容写出来即可</div>
<div>  }</div>
<div>}</div>
</blockquote>
</div>
<div>参考：<a href="http://blog.sina.com.cn/s/blog_62b37bfe0101jbuq.html">http://blog.sina.com.cn/s/blog_62b37bfe0101jbuq.html</a></div>
<div></div>
</div>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1399.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
