<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; geneset</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/geneset/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>制作自己的gene set文件给gsea软件</title>
		<link>http://www.bio-info-trainee.com/2144.html</link>
		<comments>http://www.bio-info-trainee.com/2144.html#comments</comments>
		<pubDate>Thu, 15 Dec 2016 11:43:56 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[geneset]]></category>
		<category><![CDATA[GMT]]></category>
		<category><![CDATA[GSEA]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2144</guid>
		<description><![CDATA[熟悉GSEA软件的都知道，它只需要GCT,CLS和GMT文件，其中GMT文件，G &#8230; <a href="http://www.bio-info-trainee.com/2144.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>熟悉GSEA软件的都知道，它只需要GCT,CLS和GMT文件，其中GMT文件，GSEA的作者已经给出了一大堆！就是记录broad的<a href="http://software.broadinstitute.org/gsea/msigdb/collections.jsp">Molecular Signatures Database (MSigDB) </a>已经收到了18026个geneset，<span style="color: #ff00ff;"><strong>但是我奇怪的是里面竟然没有包括cancer testis的gene set，MSigDB的确是多，但未必全，其实里面还有很多重复。而且有不少几乎没有意义的gene set。</strong></span>那我想做自己的gene set来用gsea软件做分析，就需要自己制造gmt格式的数据。因为即使下载了MSigDB的gene set，本质上就是gmt格式的数据而已：<a href="http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29">http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29</a><span id="more-2144"></span></p>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\d248f30a00954d078e9ccb7b485f0c6c\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="1421CC66B794477C8577DABCCA491669" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/41.png"><img class="alignnone size-full wp-image-2145" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/41.png" alt="4" width="937" height="615" /></a></div>
<div>我们首先要拿到自己感兴趣的gene set里面的gene list，最好是以hugo规定的标准symbol。</div>
<div>比如我感兴趣的是 ：<a href="http://www.cta.lncc.br/modelo.php">http://www.cta.lncc.br/modelo.php</a></div>
<div>我这里提供一个2列的文件，直接转换成gmt的R代码！</div>
<div>
<div>文件来自于：<a href="http://www.bio-info-trainee.com/1188.html">下载最新版的KEGG信息，并且解析好</a>，如下：</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/4b709b96ce244dcaad788d8a71e8a8ef/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="5C955ADB038545608FBEC81072EE8201" /><img class="alignnone" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image004.png" alt="" width="745" height="326" /></div>
<div>首先在R里面赋值一个变量path2gene_file就是图中的kegg2gene.txt文件，读到R里面去</div>
<div>tmp=read.table(path2gene_file,sep="\t",colClasses=c('character'))</div>
<div>#tmp=toTable(org.Hs.egPATH)</div>
<div># first column is kegg ID, second column is entrez ID</div>
<div>GeneID2kegg_list&lt;&lt;- tapply(tmp[,1],as.factor(tmp[,2]),function(x) x)</div>
<div>kegg2GeneID_list&lt;&lt;- tapply(tmp[,2],as.factor(tmp[,1]),function(x) x)</div>
<div>这个变量kegg2GeneID_list是一个list，因为是entrez gene ID，需要转换成symbol，我就不多说了，转换后的数据，就是kegg2symbol_list 。</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/b98ac452e2a34f39946b3048bccc7d32/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="2E7838E03E8F44EAAB168B5F42FAB6CC" /></div>
<div>最后对 kegg2symbol_list 输出成gmt文件：</div>
<div>
<blockquote>
<div>write.gmt &lt;- function(geneSet=kegg2symbol_list,gmt_file='kegg2symbol.gmt'){</div>
<div></div>
<div>sink( gmt_file )</div>
<div>for (i in 1:length(geneSet)){</div>
<div>cat(names(geneSet)[i])</div>
<div>cat('\tNA\t')</div>
<div>cat(paste(geneSet[[i]],collapse = '\t'))</div>
<div>cat('\n')</div>
<div></div>
<div>}</div>
<div></div>
<div>sink()</div>
<div></div>
<div>}</div>
</blockquote>
</div>
</div>
<div><img class="alignnone size-full wp-image-2146" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/5.png" alt="5" width="555" height="562" /></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2144.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
