<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; GSEA</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/gsea/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>制作自己的gene set文件给gsea软件</title>
		<link>http://www.bio-info-trainee.com/2144.html</link>
		<comments>http://www.bio-info-trainee.com/2144.html#comments</comments>
		<pubDate>Thu, 15 Dec 2016 11:43:56 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[geneset]]></category>
		<category><![CDATA[GMT]]></category>
		<category><![CDATA[GSEA]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2144</guid>
		<description><![CDATA[熟悉GSEA软件的都知道，它只需要GCT,CLS和GMT文件，其中GMT文件，G &#8230; <a href="http://www.bio-info-trainee.com/2144.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>熟悉GSEA软件的都知道，它只需要GCT,CLS和GMT文件，其中GMT文件，GSEA的作者已经给出了一大堆！就是记录broad的<a href="http://software.broadinstitute.org/gsea/msigdb/collections.jsp">Molecular Signatures Database (MSigDB) </a>已经收到了18026个geneset，<span style="color: #ff00ff;"><strong>但是我奇怪的是里面竟然没有包括cancer testis的gene set，MSigDB的确是多，但未必全，其实里面还有很多重复。而且有不少几乎没有意义的gene set。</strong></span>那我想做自己的gene set来用gsea软件做分析，就需要自己制造gmt格式的数据。因为即使下载了MSigDB的gene set，本质上就是gmt格式的数据而已：<a href="http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29">http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29</a><span id="more-2144"></span></p>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\d248f30a00954d078e9ccb7b485f0c6c\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="1421CC66B794477C8577DABCCA491669" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/41.png"><img class="alignnone size-full wp-image-2145" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/41.png" alt="4" width="937" height="615" /></a></div>
<div>我们首先要拿到自己感兴趣的gene set里面的gene list，最好是以hugo规定的标准symbol。</div>
<div>比如我感兴趣的是 ：<a href="http://www.cta.lncc.br/modelo.php">http://www.cta.lncc.br/modelo.php</a></div>
<div>我这里提供一个2列的文件，直接转换成gmt的R代码！</div>
<div>
<div>文件来自于：<a href="http://www.bio-info-trainee.com/1188.html">下载最新版的KEGG信息，并且解析好</a>，如下：</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/4b709b96ce244dcaad788d8a71e8a8ef/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="5C955ADB038545608FBEC81072EE8201" /><img class="alignnone" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image004.png" alt="" width="745" height="326" /></div>
<div>首先在R里面赋值一个变量path2gene_file就是图中的kegg2gene.txt文件，读到R里面去</div>
<div>tmp=read.table(path2gene_file,sep="\t",colClasses=c('character'))</div>
<div>#tmp=toTable(org.Hs.egPATH)</div>
<div># first column is kegg ID, second column is entrez ID</div>
<div>GeneID2kegg_list&lt;&lt;- tapply(tmp[,1],as.factor(tmp[,2]),function(x) x)</div>
<div>kegg2GeneID_list&lt;&lt;- tapply(tmp[,2],as.factor(tmp[,1]),function(x) x)</div>
<div>这个变量kegg2GeneID_list是一个list，因为是entrez gene ID，需要转换成symbol，我就不多说了，转换后的数据，就是kegg2symbol_list 。</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/b98ac452e2a34f39946b3048bccc7d32/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="2E7838E03E8F44EAAB168B5F42FAB6CC" /></div>
<div>最后对 kegg2symbol_list 输出成gmt文件：</div>
<div>
<blockquote>
<div>write.gmt &lt;- function(geneSet=kegg2symbol_list,gmt_file='kegg2symbol.gmt'){</div>
<div></div>
<div>sink( gmt_file )</div>
<div>for (i in 1:length(geneSet)){</div>
<div>cat(names(geneSet)[i])</div>
<div>cat('\tNA\t')</div>
<div>cat(paste(geneSet[[i]],collapse = '\t'))</div>
<div>cat('\n')</div>
<div></div>
<div>}</div>
<div></div>
<div>sink()</div>
<div></div>
<div>}</div>
</blockquote>
</div>
</div>
<div><img class="alignnone size-full wp-image-2146" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/5.png" alt="5" width="555" height="562" /></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2144.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>java版本GSEA软件的ES score图片的修改</title>
		<link>http://www.bio-info-trainee.com/2105.html</link>
		<comments>http://www.bio-info-trainee.com/2105.html#comments</comments>
		<pubDate>Thu, 01 Dec 2016 16:53:10 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[ES score]]></category>
		<category><![CDATA[GSEA]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2105</guid>
		<description><![CDATA[首先要明白这个ES score图片里面的数据是什么，这样才能修改它，因为java &#8230; <a href="http://www.bio-info-trainee.com/2105.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>首先要明白这个ES score图片里面的数据是什么，这样才能修改它，因为java是一个封闭打包好的软件，所以我们没办法在里面修改它没有提供的参数，运行完GSEA，默认输出的图就是下面这样：<span id="more-2105"></span></p>
<div style="width: 513px" class="wp-caption alignnone"><img class="" src="http://note.youdao.com/yws/api/group/23785548/noteresource/9ED49F972A0F4980AE784E76A7DFFC29/version/256?method=get-resource&amp;shareToken=DBDB0277A315444BBBAB2024190208AE&amp;entryId=123732909" alt="" width="503" height="504" /><p class="wp-caption-text">ES score</p></div>
<p>这个图片在发表的时候，就会发现其实蛮模糊的， 所以有可能需要自己重新制作这个图，那么就需要明白这个图后面的数据。</p>
<p>其中最下面的数据是量方法测到了2万个基因，那么这两万个基因在case和control组的差异度量(六种差异度量，默认是signal 2 noise，GSEA官网有提供公式，也可以选择大家熟悉的foldchange)肯定不一样,那么根据它们的差异度量，就可以对它们进行排序，并且Z-score标准化的结果。</p>
<p>而中间的就是该gene set在测到了的已经根据signal2noise排好序的2万个基因的位置。</p>
<p>最上面的图，就是所有的基因的ES score都要一个个加起来，叫做running  ES score，在加的过程中，什么时候ES score达到了最大值，就是这个gene set最终的ES score！</p>
<p>我这里全面解析了GSEA官网提供的R代码的绘图函数，如下：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/ES-SCORE图的画法.png"><img class="alignnone size-full wp-image-2106" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/ES-SCORE图的画法.png" alt="es-score%e5%9b%be%e7%9a%84%e7%94%bb%e6%b3%95" width="1574" height="650" /></a></p>
<p>这个函数本身也被我抽离出来了：</p>
<p>这个知识点有点复杂，<strong><span style="color: #ff0000;">我解释的很清楚数据是什么，但是数据如何来的（就是下面代码读取的txt文件）</span></strong>，我没办法用博客写清楚，需要修改一个2500行的源代码才能获取数据！</p>
<blockquote><p>setwd('data')<br />
<strong><span style="color: #ff0000;">Obs.RES=read.table('Obs.RES.txt') </span></strong><br />
<strong><span style="color: #ff0000;">Obs.RES=t(Obs.RES) ## 每个基因在每个gene set里面的running ES score，一个矩阵</span></strong><br />
<strong><span style="color: #ff0000;">Obs.indicator=read.table('Obs.indicator.txt') </span></strong><br />
<strong><span style="color: #ff0000;">Obs.indicator=t(Obs.indicator) ## 每个基因是否属于每个gene set，一个0/1矩阵</span></strong><br />
<strong><span style="color: #ff0000;">obs.s2n=read.table('obs.s2n.txt')[,1]  ## 每个基因的signal 2 noise值，已经Z-score化，而且排好序了。</span></strong><br />
<strong><span style="color: #ff0000;">size.G=read.table('size.G.txt')[,1]  ## 每个gene set的基因数量，在图中需要显示</span></strong><br />
<strong><span style="color: #ff0000;">gs.names=read.table('gs.names.txt')[,1] ## 每个gene set的名字，在图中需要显示</span></strong><br />
<strong><span style="color: #ff0000;">Obs.arg.ES=read.table('Obs.arg.ES.txt')[,1]## 每个gene set的最大ES score出现在排序基因的位置</span></strong><br />
<strong><span style="color: #ff0000;">Obs.ES.index=read.table('Obs.ES.index.txt')[,1]## 这个用不着的，我也忘记是什么了</span></strong><br />
<strong><span style="color: #ff0000;">Obs.ES=read.table('Obs.ES.txt')[,1]  ##每个gene set的最大ES score是多少，如果是正值，用红色表示富集在case组，如果是负值，用蓝色，表示富集在control组。</span></strong></p>
<p>plot_ES_score &lt;- function(Ng=12,N=34688,phen1='control',phen2='case',Obs.RES,Obs.indicator,obs.s2n,size.G,gs.names,Obs.arg.ES,Obs.ES.index){<br />
for (i in 1:Ng) {<br />
png(paste0('number_',gs.names[i],'.png'))<br />
ind &lt;- 1:N<br />
min.RES &lt;- min(Obs.RES[i,])<br />
max.RES &lt;- max(Obs.RES[i,])<br />
if (max.RES &lt; 0.3) max.RES &lt;- 0.3<br />
if (min.RES &gt; -0.3) min.RES &lt;- -0.3<br />
delta &lt;- (max.RES - min.RES)*0.50<br />
min.plot &lt;- min.RES - 2*delta<br />
max.plot &lt;- max.RES<br />
max.corr &lt;- max(obs.s2n)<br />
min.corr &lt;- min(obs.s2n)<br />
Obs.correl.vector.norm &lt;- (obs.s2n - min.corr)/(max.corr - min.corr)*1.25*delta + min.plot<br />
zero.corr.line &lt;- (- min.corr/(max.corr - min.corr))*1.25*delta + min.plot<br />
col &lt;- ifelse(Obs.ES[i] &gt; 0, 2, 4)</p>
<p># Running enrichment plot</p>
<p>sub.string &lt;- paste("Number of genes: ", N, " (in list), ", size.G[i], " (in gene set)", sep = "", collapse="")</p>
<p>main.string &lt;- paste("Gene Set ", i, ":", gs.names[i])</p>
<p>plot(ind, Obs.RES[i,], main = main.string, sub = sub.string, xlab = "Gene List Index", ylab = "Running Enrichment Score (RES)", xlim=c(1, N), ylim=c(min.plot, max.plot), type = "l", lwd = 2, cex = 1, col = col)<br />
for (j in seq(1, N, 20)) {<br />
lines(c(j, j), c(zero.corr.line, Obs.correl.vector.norm[j]), lwd = 1, cex = 1, col = colors()[12]) # shading of correlation plot<br />
}<br />
lines(c(1, N), c(0, 0), lwd = 1, lty = 2, cex = 1, col = 1) # zero RES line<br />
lines(c(Obs.arg.ES[i], Obs.arg.ES[i]), c(min.plot, max.plot), lwd = 1, lty = 3, cex = 1, col = col) # max enrichment vertical line<br />
for (j in 1:N) {<br />
if (Obs.indicator[i, j] == 1) {<br />
lines(c(j, j), c(min.plot + 1.25*delta, min.plot + 1.75*delta), lwd = 1, lty = 1, cex = 1, col = 1) # enrichment tags<br />
}<br />
}<br />
lines(ind, Obs.correl.vector.norm, type = "l", lwd = 1, cex = 1, col = 1)<br />
lines(c(1, N), c(zero.corr.line, zero.corr.line), lwd = 1, lty = 1, cex = 1, col = 1) # zero correlation horizontal line<br />
temp &lt;- order(abs(obs.s2n), decreasing=T)<br />
arg.correl &lt;- temp[N]<br />
lines(c(arg.correl, arg.correl), c(min.plot, max.plot), lwd = 1, lty = 3, cex = 1, col = 3) # zero crossing correlation vertical line</p>
<p>leg.txt &lt;- paste("\"", phen1, "\" ", sep="", collapse="")<br />
text(x=1, y=min.plot, adj = c(0, 0), labels=leg.txt, cex = 1.0)</p>
<p>leg.txt &lt;- paste("\"", phen2, "\" ", sep="", collapse="")<br />
text(x=N, y=min.plot, adj = c(1, 0), labels=leg.txt, cex = 1.0)</p>
<p>adjx &lt;- ifelse(Obs.ES[i] &gt; 0, 0, 1)</p>
<p>leg.txt &lt;- paste("Peak at ", Obs.arg.ES[i], sep="", collapse="")<br />
text(x=Obs.arg.ES[i], y=min.plot + 1.8*delta, adj = c(adjx, 0), labels=leg.txt, cex = 1.0)</p>
<p>leg.txt &lt;- paste("Zero crossing at ", arg.correl, sep="", collapse="")<br />
text(x=arg.correl, y=min.plot + 1.95*delta, adj = c(adjx, 0), labels=leg.txt, cex = 1.0)<br />
dev.off()<br />
}</p>
<p>}</p>
<p>&nbsp;</p></blockquote>
<p>通过这个代码，就可以把当前所有gese set的 ES score图给重新画一下，如果需要调整字体大小，就去代码里面慢慢调整。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2105.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GSEA的统计学原理试讲</title>
		<link>http://www.bio-info-trainee.com/2102.html</link>
		<comments>http://www.bio-info-trainee.com/2102.html#comments</comments>
		<pubDate>Thu, 01 Dec 2016 16:39:21 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[ES score]]></category>
		<category><![CDATA[foldchange]]></category>
		<category><![CDATA[GSEA]]></category>
		<category><![CDATA[RES]]></category>
		<category><![CDATA[signal2noise]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2102</guid>
		<description><![CDATA[GSEA这个java软件使用非常方便，只需要根据要求做好GCT/CLS格式的in &#8230; <a href="http://www.bio-info-trainee.com/2102.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>GSEA这个java软件使用非常方便，只需要根据要求做好GCT/CLS格式的input文件就好了。我以前也写个用法教程：</p>
<div><a href="http://www.bio-info-trainee.com/1282.html">用GSEA来做基因集富集分析</a></div>
<div><a href="http://www.bio-info-trainee.com/1334.html">批量运行GSEA，命令行版本</a></div>
<div>但说到统计学原理，就有点麻烦了，我试着用自己的思路阐释一下：</div>
<div>假设芯片或者其它测量方法测到了2万个基因，那么这两万个基因在case和control组的差异度量(六种差异度量，默认是signal 2 noise，GSEA官网有提供公式，也可以选择大家熟悉的foldchange)肯定不一样,那么根据它们的差异度量，就可以对它们进行排序，并且Z-score标准化，在下图的最底端展示的就是</div>
<p><span id="more-2102"></span></p>
<div><img class="alignnone" src="http://note.youdao.com/yws/api/group/23785548/noteresource/9ED49F972A0F4980AE784E76A7DFFC29/version/256?method=get-resource&amp;shareToken=DBDB0277A315444BBBAB2024190208AE&amp;entryId=123732909" alt="" width="503" height="504" /></div>
<div>那么图中间，就是我们每个gene set里面的基因在所有的2万个排序好基因的位置，如果gene set里面的基因集中在2万个基因的前面部分，就是在case里面富集，如果集中在后面部分，就是在control里面富集着。</div>
<div>而最上面的那个ES score的算法，大概如下：</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/1.png"><img class="alignnone  wp-image-2103" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/1.png" alt="1" width="725" height="581" /></a></div>
<div>仔细看，其实还是能看明白的，每个基因在每个gene set里面的ES score取决于这个基因是否属于该gene set，还有就是它的差异度量，上图的差异度量就是FC（foldchange）,对每个gene set来说，所有的基因的ES score都要一个个加起来，叫做running  ES score，在加的过程中，什么时候ES score达到了最大值，就是这个gene set最终的ES score！</div>
<div>
<div>算法解读我参考的PPT，反正我是看懂了，但不一定能讲清楚：</div>
<div><a href="http://bioinformatics.mdanderson.org/MicroarrayCourse/Lectures09/gsea1_bw.pdf">http://bioinformatics.mdanderson.org/MicroarrayCourse/Lectures09/gsea1_bw.pdf</a></div>
<div><a href="https://bioinformatics.cancer.gov/sites/default/files/course_material/GSEA_Theory.pptx">https://bioinformatics.cancer.gov/sites/default/files/course_material/GSEA_Theory.pptx</a></div>
<div><a href="http://compbio.ucdenver.edu/Hunter_lab/Phang/downloads/files/GSEA.ppt">http://compbio.ucdenver.edu/Hunter_lab/Phang/downloads/files/GSEA.ppt</a></div>
<div><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1239896/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1239896/</a></div>
<div><a href="http://www.baderlab.org/CancerStemCellProject/VeroniqueVoisin/AdditionalResources/GSEA">http://www.baderlab.org/CancerStemCellProject/VeroniqueVoisin/AdditionalResources/GSEA</a></div>
<div>软件还有大把的参数可以调整：<a href="http://www.baderlab.org/CancerStemCellProject/VeroniqueVoisin/AdditionalResources/GSEA/parameters">http://www.baderlab.org/CancerStemCellProject/VeroniqueVoisin/AdditionalResources/GSEA/parameters</a></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2102.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>批量运行GSEA，命令行版本</title>
		<link>http://www.bio-info-trainee.com/1334.html</link>
		<comments>http://www.bio-info-trainee.com/1334.html#comments</comments>
		<pubDate>Mon, 11 Jan 2016 13:12:48 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[GSEA]]></category>
		<category><![CDATA[批量]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1334</guid>
		<description><![CDATA[之前用过有界面的那种，那样非常方便，只需要做好数据即可，但是如果有非常多的数据， &#8230; <a href="http://www.bio-info-trainee.com/1334.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>之前用过有界面的那种，那样非常方便，只需要做好数据即可，但是如果有非常多的数据，每次都要点击文件，点击下一步，也很烦，不过，，它既然是java软件，就可以以命令行的形式来玩转它！</p>
<div>能够命令行运行了，就很容易批量啦</p>
<div><b>一、程序安装</b></div>
<p>直接在官网下载java版本软件即可：<b><a href="http://software.broadinstitute.org/gsea/downloads.jsp" target="_blank">http://software.broadinstitute.org/gsea/downloads.jsp</a></b></p>
<div><span style="color: #ff0000;"><b><b>二、输入数据</b></b></span></div>
<p>需要下载gmt文件，自己制作gct和cls文件，或者直接下载测试文件p53</p>
<div>见：<a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats" target="_blank">http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats</a></p>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/12.png"><img class="alignnone size-full wp-image-1335" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/12.png" alt="1" width="514" height="361" /></a></div>
<div><span style="color: #ff0000;"><b><b>三、运行命令</b></b></span></div>
<p>hgu95av2的芯片数据，只有一万多探针，所以很快就可以出结果</p>
<div> java -cp gsea2-2.2.1.jar  -Xmx1024m xtools.gsea.Gsea   -gmx c2.cp.kegg.v5.0.symbols.gmt  \</div>
<div> -res P53_hgu95av2.gct  -cls P53.cls   -chip  chip/HG_U95Av2.chip  -out result -rpt_label p53_example</div>
<div>这个参数其实非常多，见文件<a href="http://software.broadinstitute.org/gsea/doc/linkedFiles/GSEAParameters.txt" target="_blank">http://software.broadinstitute.org/gsea/doc/linkedFiles/GSEAParameters.txt</a></div>
<div>但是一般我们都用默认的即可</p>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/21.png"><img class="alignnone size-full wp-image-1336" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/21.png" alt="2" width="723" height="405" /></a></div>
<div>里面报错说有些探针找不到，不要管它</p>
<div><span style="color: #ff0000;"><b><b>四、输出数据</b></b></span></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/32.png"><img class="alignnone size-full wp-image-1337" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/32.png" alt="3" width="823" height="387" /></a></div>
<div>自己看官网去理解这些结果咯！</div>
<div><span style="color: #000000;">需要下载的数据如下：<br />
</span></p>
<div>首先需要下载 Molecular Signatures Database (MSigDB)，一般选择C2的kegg，BioCarta 还有Reactome</p>
<div><a href="http://software.broadinstitute.org/gsea/downloads.jsp" target="_blank">http://software.broadinstitute.org/gsea/downloads.jsp</a></div>
<div>都是gmt格式的文件！</div>
<div>
<table border="1" width="100%" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<th>CP: Canonical pathways<br />
(<a href="http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP">browse 1330 gene sets</a>)</th>
<td>Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. <a href="http://software.broadinstitute.org/gsea/msigdb/collection_details.jsp#CP">details</a></td>
<td>Download GMT Files<br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.v5.0.orig.gmt">original identifiers</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.v5.0.symbols.gmt">gene symbols</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.v5.0.entrez.gmt">entrez genes ids</a></td>
</tr>
<tr>
<th>CP:BIOCARTA: BioCarta gene sets<br />
(<a href="http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:BIOCARTA">browse 217 gene sets</a>)</th>
<td>Gene sets derived from the BioCarta pathway database (<a href="http://www.biocarta.com/genes/index.asp">http://www.biocarta.com/genes/index.asp</a>).</td>
<td>Download GMT Files<br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.biocarta.v5.0.orig.gmt">original identifiers</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.biocarta.v5.0.symbols.gmt" target="_blank">gene symbols</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.biocarta.v5.0.entrez.gmt">entrez genes ids</a></td>
</tr>
<tr>
<th>CP:KEGG: KEGG gene sets<br />
(<a href="http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:KEGG">browse 186 gene sets</a>)</th>
<td>Gene sets derived from the KEGG pathway database (<a href="http://www.genome.jp/kegg/pathway.html">http://www.genome.jp/kegg/pathway.html</a>).</td>
<td>Download GMT Files<br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.kegg.v5.0.orig.gmt">original identifiers</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.kegg.v5.0.symbols.gmt" target="_blank">gene symbols</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.kegg.v5.0.entrez.gmt">entrez genes ids</a></td>
</tr>
<tr>
<th>CP:REACTOME: Reactome gene sets<br />
(<a href="http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:REACTOME">browse 674 gene sets</a>)</th>
<td>Gene sets derived from the Reactome pathway database (<a href="http://www.reactome.org/">http://www.reactome.org/</a>).</td>
<td>Download GMT Files<br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.reactome.v5.0.orig.gmt">original identifiers</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.reactome.v5.0.symbols.gmt" target="_blank">gene symbols</a><br />
<a href="http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/c2.cp.reactome.v5.0.entrez.gmt">entrez genes ids</a></td>
</tr>
</tbody>
</table>
</div>
<div>然后做出表达数据gct文件和cls表型文件~</div>
<div>见：<a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats" target="_blank">http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats</a></div>
<div>然后就可以直接运行了</div>
<div>如果是芯片数据，第一列是芯片探针，那么还需要下载chip数据：<a href="ftp://ftp.broadinstitute.org/pub/gsea/annotations" target="_blank">ftp://ftp.broadinstitute.org/pub/gsea/annotations</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1334.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>用GSEA来做基因集富集分析</title>
		<link>http://www.bio-info-trainee.com/1282.html</link>
		<comments>http://www.bio-info-trainee.com/1282.html#comments</comments>
		<pubDate>Wed, 30 Dec 2015 01:17:43 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[GSEA]]></category>
		<category><![CDATA[富集分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1282</guid>
		<description><![CDATA[how to use GSEA? 这个有点类似于pathway（GO,KEGG等 &#8230; <a href="http://www.bio-info-trainee.com/1282.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div><span style="color: #ff0000; font-family: FangSong_GB2312;"><b>how to use GSEA?</b></span></div>
<div>这个有点类似于pathway（GO,KEGG等）的富集分析，区别在于gene set（矫正好的基于文献的数据库）的概念更广泛一点，包括了</div>
<p><b><span style="color: #ff0000; font-family: FangSong_GB2312;">how to download GSEA ?</span></b></p>
<div>软件下载地址：<a href="http://software.broadinstitute.org/gsea/downloads.jsp" target="_blank">http://software.broadinstitute.org/gsea/downloads.jsp</a></p>
<div>教程：<a href="http://software.broadinstitute.org/gsea/doc/desktop_tutorial.jsp" target="_blank">http://software.broadinstitute.org/gsea/doc/desktop_tutorial.jsp</a></div>
<div>需要自己安装好java环境！</div>
</div>
<p><b><span style="color: #ff0000; font-family: FangSong_GB2312;">what's the input for the GSEA?</span></b></p>
<div>说明书上写的输入数据是：GSEA supported data files are simply tab delimited ASCII text files, which have special file extensions that identify them. For example, expression data usually has the extension *.gct, phenotypes *.cls, gene sets *.gmt, and chip annotations *.chip. Click the <b>More on file formats</b> help button to view detailed descriptions of all the data file formats.</div>
<div>
<div>并且提供了测试数据：<a href="http://software.broadinstitute.org/gsea/datasets.jsp" target="_blank">http://software.broadinstitute.org/gsea/datasets.jsp</a></div>
<div>实际上没那么复杂，一个表达矩阵即可！然后做一个分组说明的cls文件即可。</div>
<div>主要是自己看说明书，做出要求的数据格式：<a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats" target="_blank">http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats</a></div>
<div>表达矩阵我这里下载GSE1009数据集做测试吧！</div>
<div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse1009">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse1009</a></div>
<div><a href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1009/matrix/GSE1009_series_matrix.txt.gz" target="_blank">ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1009/matrix/GSE1009_series_matrix.txt.gz</a></div>
</div>
<div>
<div>cls的样本说明文件，就随便搞一搞吧，下面这个是例子：</div>
<div>6 2 1</div>
<div># good bad</div>
<div>good good good bad bad bad</div>
<div>文件如下，六个样本，根据探针来的表达数据，分组前后各三个一组。</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/clipboard7.png"><img class="alignnone size-full wp-image-1283" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/clipboard7.png" alt="clipboard" width="538" height="427" /></a></div>
</div>
</div>
<div><b><span style="color: #ff0000; font-family: FangSong_GB2312;">现在开始运行GSEA！</span></b></div>
<p><b><span style="color: #ff0000; font-family: FangSong_GB2312;">start to run the GSEA !</span></b></p>
<div>
<div></div>
<div>首先载入数据</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/clipboard8.png"><img class="alignnone size-full wp-image-1284" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/clipboard8.png" alt="clipboard" width="520" height="389" /></a></div>
<div>确定无误，就开始运行，运行需要设置一定的参数！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/clipboard9.png"><img class="alignnone size-full wp-image-1285" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/clipboard9.png" alt="clipboard" width="759" height="533" /></a></div>
</div>
<p><b><span style="color: #ff0000; font-family: FangSong_GB2312;">what's the output ?</span></b></p>
<div>
<div>输出的数据非常多，对你选择的gene set数据集里面的每个set都会分析看看是否符合富集的标准，富集就出来一个报告。</div>
<div></div>
<div>点击success就能进入报告主页，里面的链接可以进入任意一个分报告。</div>
<div></div>
<div>最大的特色是提供了大量的数据集：You can browse the MSigDB from the <a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp" target="_blank">Molecular Signatures Database</a> page of the GSEA web site or the Browse MSigDB page of the GSEA application. To browse the MSigDB from the GSEA application:</div>
<div></div>
<div>还自己建立了wiki说明主页：<span style="color: #000000; font-family: Verdana,Arial,Helvetica,sans-serif;"><a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page" target="_blank">http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page</a></span></div>
<div><span style="color: #000000; font-family: Verdana,Arial,Helvetica,sans-serif;"> </span></div>
<div>有些文献是基于GSEA的：</div>
<div><a href="http://www.ncbi.nlm.nih.gov/pubmed/16199517" target="_blank">www.ncbi.nlm.nih.gov/pubmed/16199517</a></div>
<div><a href="http://stke.sciencemag.org/highwire/filestream/4681053/field_highwire_adjunct_files/1/2001966_Slides.zip" target="_blank">http://stke.sciencemag.org/highwire/filestream/4681053/field_highwire_adjunct_files/1/2001966_Slides.zip</a></div>
<div><a href="http://www.ingentaconnect.com/content/ben/cbio/2007/00000002/00000002/art00003" target="_blank">http://www.ingentaconnect.com/content/ben/cbio/2007/00000002/00000002/art00003</a></div>
<div><a href="http://www.nature.com/articles/ng0704-663a" target="_blank">http://www.nature.com/articles/ng0704-663a</a></div>
<div><a href="http://bioinformatics.oxfordjournals.org/content/23/23/3251.short" target="_blank">http://bioinformatics.oxfordjournals.org/content/23/23/3251.short</a></div>
<div><a href="http://link.springer.com/article/10.1007/s00335-011-9359-x" target="_blank">http://link.springer.com/article/10.1007/s00335-011-9359-x</a></div>
<div>
<h3><a href="http://link.springer.com/article/10.1007/s00335-011-9359-x" target="_blank">Identification of high-copper-responsive target pathways in Atp7b knockout mouse liver byGSEA on microarray data sets</a></h3>
</div>
<div>
<h3><a href="http://synapse.koreamed.org/search.php?where=aview&amp;id=10.4110/in.2011.11.6.406&amp;code=0078IN&amp;vmode=FULL" target="_blank">Comparison of invariant NKT cells with conventional T cells by using gene set enrichment analysis (GSEA)</a></h3>
</div>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1282.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
