<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; GISTIC</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/gistic/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用GISTIC多个segment文件来找SCNA变异</title>
		<link>http://www.bio-info-trainee.com/1648.html</link>
		<comments>http://www.bio-info-trainee.com/1648.html#comments</comments>
		<pubDate>Thu, 19 May 2016 12:13:36 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[cnv]]></category>
		<category><![CDATA[GISTIC]]></category>
		<category><![CDATA[somatic]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1648</guid>
		<description><![CDATA[这个软件在TCGA计划里面被频繁使用者，用这个软件的目的很简单，就是你研究了很多 &#8230; <a href="http://www.bio-info-trainee.com/1648.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这个软件在TCGA计划里面被频繁使用者，用这个软件的目的很简单，就是你研究了很多癌症样本，通过芯片得到了每个样本的拷贝数变化信息，芯片结果一般是segment结果，可以解释为CNV区域，需要用GISTIC把样本综合起来分析，寻找somatic的CNV，并且注释基因信息。</p>
<p>有两个难点，一是在linux下面安装matlab工作环境，二是如何制作输入文件。</p>
<p><span id="more-1648"></span></p>
<div><b>一、程序安装</b></div>
<div><b><b>安装指南：ftp://ftp.broadinstitute.org/pub/GISTIC2.0/INSTALL.txt</b></b></p>
<div>软件官网： <a href="http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&amp;paper_id=216&amp;p=t">http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&amp;paper_id=216&amp;p=t</a></div>
<div>paper ： <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218867/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218867/</a></div>
<div>下载：wget ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTIC_2_0_22.tar.gz</div>
<div>它的文档写的非常详细：ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm<br />
<b>解压之后，需要自己安装matlab编译环境，这个会很麻烦！</b></div>
</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/18.png"><img class="alignnone size-full wp-image-1649" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/18.png" alt="1" width="605" height="309" /></a></div>
<div>
<div><b><span style="color: #ff0000;">二、输入数据准备</span></b></div>
<div>用picnic或者birdseed等软件处理snp6.0芯片的raw data之后得到的segment文件</div>
<div>多个样本的segment合并起来作为输入数据，还有样本列表，芯片的一些信息，根据示例文件，很容易做出input文件！</div>
<div>arraylistfile就是你本次运行GISTIC软件所涉及到的所有样本，一般一个癌种一起运行。</div>
<div>cnvfiles可以不用。</div>
<div>segmentationfile.txt 就是你snp6.0等芯片运行得到的segment信息，把所有样本的结果合并在一起，一般一个样本的segment有1000千左右</div>
<div>markersfile.txt主要取决于你的芯片平台，如果是affymetrix的snp6.0芯片，会有90多万行数据，每个探针的信息都有。</div>
<div></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/23.png"><img class="alignnone size-full wp-image-1650" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/23.png" alt="2" width="420" height="92" /></a></div>
<div>软件自带的测试数据如上，可以看到是106个样本，总共是两万多segment信息，那么也就意味着平均每个样本才200个，可能是snp6.0芯片数据的PICNIC软件的结果。但是它的</p>
<div>
<div>markersfile.txt 明确写着才十多万mark，也就是探针，所以应该不是</p>
<div>snp6.0芯片</div>
</div>
</div>
</div>
<div>
<div>    106 arraylistfile.txt</div>
<div>  12942 cnvfile.txt</div>
<div> 115593 markersfile.txt</div>
<div>  20521 segmentationfile.txt</div>
</div>
<div>
<div></div>
<div><b><span style="color: #ff0000;">三、程序使用</span></b></div>
<div><span style="color: #ff0000;">软件提供的运行脚本使用的是csh，我修改成了bash</span></div>
<div><span style="color: #ff0000;">还需要修改matlab的路径及基因组版本信息</span></div>
</div>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/34.png"><img class="alignnone size-full wp-image-1651" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/34.png" alt="3" width="880" height="471" /></a></p>
</div>
<div>
<div>
<div><b><span style="color: #ff0000;">四、输出数据解读</span></b></div>
</div>
</div>
<div></div>
<p><span lang="ZH-CN">简单解释下输出的目录下的文件</span></p>
<p><span lang="EN-US">all_data_by_genes.txt </span>代表了基因（包括非编码<span lang="EN-US">RNA</span>如<span lang="EN-US">miRNA</span>，<span lang="EN-US">lncRNA</span>）在样本中具体的拷贝数值。</p>
<p><span lang="EN-US">all_lesions.conf_90.txt </span>代表识别的拷贝数扩增和缺失<span lang="EN-US">Peak</span>区域。</p>
<p><span lang="EN-US">all_thresholded.by_genes.txt </span>代表离散化之后的数值，如<span lang="EN-US">-2</span>代表丢失两个拷贝，<span lang="EN-US">-1</span>代表丢失一个拷贝<span lang="EN-US">,0</span>代表拷贝数正常<span lang="EN-US">,1</span>代表增加一个拷贝，<span lang="EN-US">2</span>代表扩增两个拷贝。</p>
<p><span lang="EN-US">broad_significance_results.txt</span>代表显著发生拷贝数变异的<span lang="EN-US">broad</span>区域。</p>
<p><span lang="EN-US">broad_values_by_arm.txt </span>代表染色体臂在样本中的拷贝数数值。</p>
<p><span lang="EN-US">scores.gistic</span>代表通过该方法打分之后的结果。</p>
<div><span style="color: #000000;"><strong>我写这个教程应该是2016年夏季了，现在已经是2017年秋季，这个软件又更新了，增加了对hg38版本的参考基因组数据进行处理，同时还把csh更改成了bash，真棒！</strong></span></div>
<div> 2.0.23 (2017-03-27) - The markers file input is now optional - if omitted, pseudo-markers will be<br />
generated to satisfy GISTIC's input requirements while ensuring reasonably<br />
uniform coverage of the genome.<br />
- The "broad analysis" of arm-level events has been revised:<br />
(1) arm-level events are now called from a single broad copy number profile<br />
instead of separate amplification and deletion profiles, which had led to<br />
arms counterintuitively called as amplified and deleted on the same sample;<br />
(2) the frequency scores used to determine z-scores and q-values, which excludes<br />
arms with the opposite call from the denominator, are now in a column called<br />
"frequency score". A new column called "frequncy" gives the intuitive frequency<br />
with the denominator inluding arms from all the samples. The analysis results<br />
for the same data will be different from that of previous GISTIC versions.<br />
- Error handling messages have been improved. In particular, many informative<br />
error messages were masked by an "Index exceeds matrix dimensions" error<br />
in the exception handler itself.<br />
- An hg38 reference genome is included with this release.<br />
- The gp_gistic2_from_seg binary executable is now compiled for MCR 8.3<br />
(Matlab R2014a). The source code is compatible with versions of Matlab up to<br />
R2016a, however, the appearance of output graphics may be altered for Matlab<br />
versions R2015a and later.<br />
- This release adds the convenient 'gistic2' wrapper function which sets up<br />
the MCR and passes its command line argument to the executable. Scripts have<br />
been converted from the C-shell to the Bourne shell.<br />
(END)</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1648.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
