<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; cnv</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/cnv/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>单细胞转录组数据分析CNV</title>
		<link>http://www.bio-info-trainee.com/3065.html</link>
		<comments>http://www.bio-info-trainee.com/3065.html#comments</comments>
		<pubDate>Sat, 17 Feb 2018 10:17:02 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[cnv]]></category>
		<category><![CDATA[单细胞]]></category>
		<category><![CDATA[转录组]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=3065</guid>
		<description><![CDATA[单细胞转录组数据分析CNV 都来aviv Regev自于实验室，一系列文章都利用 &#8230; <a href="http://www.bio-info-trainee.com/3065.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h1 class="md-end-block md-heading md-focus" contenteditable="true"><span class="">单细胞转录组数据分析CNV</span></h1>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">都来aviv Regev自于实验室，一系列文章都利用了单细胞转录组数据分析CNV。</span></span><span id="more-3065"></span></p>
<h3 class="md-end-block md-heading" contenteditable="true"><span class=""> 2014年关于GBM的science文章</span></h3>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">首先是2014年关于GBM的science文章；PMID: </span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/pubmed/24925914">24925914</a></span> ，提到了这个分析点，然后还用了CCLE数据库验证可靠性。</span></p>
<p><span class="md-line md-end-block" contenteditable="true">该文章自己的单细胞转录组数据建库选用了 SMART-seq 方法，公布在 <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57872">GSE57872</a></span></span></p>
<ul class="ul-list" data-mark="-">
<li><span class="md-line md-end-block" contenteditable="true">430(576) single glioblastoma cells isolated from 5 individual tumors</span></li>
<li><span class="md-line md-end-block" contenteditable="true">102(192) single cells from gliomasphere cells lines </span></li>
</ul>
<p><span class="md-line md-end-block" contenteditable="true">这个单细胞转录组建库方式有点落后了：</span></p>
<blockquote><p><span class="md-line md-end-block" contenteditable="true">SMART-seq protocol was implemented to generate single cell full length transcriptomes (modified from Shalek, et al Nature 2013) and sequenced using 25 bp paired end reads. Single cell cDNA libraries for <span class=""><strong>MGH30 were resequenced using 100 bp paired end reads</strong></span> to allow for isoform and splice junction reconstruction (96 samples, annotated MGH30L). </span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">所以作者过滤的比较严格，可以直接下载其分析好的表达矩阵，也可以下载原始测序数据自己走一波转录组流程。</span></span></p>
<p><span class="md-line md-end-block" contenteditable="true">第一次提出的公式如下：</span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/RNA-SEQ-CNV-formula-1.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/RNA-SEQ-CNV-formula-1.png" alt="" /></span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true"><span class=""> 2016年关于melanoma的science文章</span></h3>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">然后是2016年关于melanoma的science文章：PMID: </span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/pubmed/27124452">27124452</a></span> 也应用了单细胞转录组数据分析CNV，该文章的数据公布在 <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72056">GSE72056</a></span> 这次使用的Smart-seq2建库技术，共计 4645 个细胞，仅仅是表达矩阵就由71Mb，但是原始的测试数据在 dbGaP 数据库，需要申请才能下载。</span></p>
<figure class="md-table-fig" contenteditable="false">
<table class="md-table">
<thead>
<tr class="md-end-block">
<th><span class="td-span" contenteditable="true"><span class=""><strong>Supplementary file</strong></span></span></th>
<th><span class="td-span" contenteditable="true"><span class=""><strong>Size</strong></span></span></th>
<th><span class="td-span" contenteditable="true"><span class=""><strong>Download</strong></span></span></th>
<th><span class="td-span" contenteditable="true"><span class=""><strong>File type/resource</strong></span></span></th>
</tr>
</thead>
<tbody>
<tr class="md-end-block">
<td><span class="td-span" contenteditable="true">GSE72056_melanoma_single_cell_revised_v2.txt.gz</span></td>
<td><span class="td-span" contenteditable="true">71.6 Mb</span></td>
<td><span class="td-span" contenteditable="true"><span class=""><a spellcheck="false" href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE72nnn/GSE72056/suppl/GSE72056_melanoma_single_cell_revised_v2.txt.gz">(ftp)</a></span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE72056&amp;format=file&amp;file=GSE72056%5Fmelanoma%5Fsingle%5Fcell%5Frevised%5Fv2%2Etxt%2Egz">(http)</a></span></span></td>
<td><span class="td-span" contenteditable="true">TXT</span></td>
</tr>
</tbody>
</table>
</figure>
<blockquote><p><span class="md-line md-end-block" contenteditable="true"><span class="">we applied single-cell RNA sequencing (RNA-seq) to 4645 single cells isolated from 19 patients, profiling malignant, immune, stromal, and endothelial cells.</span></span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true">值得注意的是作者还做了bulk的转录组测序，针对6个处理 RAF or RAF+MEK inhibitors 前后供12个数据，公布在 <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE77940">GSE77940</a></span></span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="">这个时候的计算公式稍微有点变化了，如下：</span></span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula2.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula2.png" alt="" /></span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true"><span class=""> 2016年CELL杂志发表的关于头颈癌</span></h3>
<p><span class="md-line md-end-block" contenteditable="true">接着是2016年CELL杂志发表的关于头颈癌的文章：<span class=""><a spellcheck="false" href="https://www.sciencedirect.com/science/article/pii/S0092867417312709?via%3Dihub">Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer</a></span> 测序如下；</span></p>
<blockquote><p><span class="md-line md-end-block" contenteditable="true">We profiled <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/neuroscience/transcriptome">transcriptomes</a></span> of <span class=""><strong>∼6,000 single cells from 18 head and neck <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/neuroscience/squamous-epithelial-cell">squamous cell</a></span> carcinoma</strong></span> (HNSCC) patients, including five matched pairs of primary tumors and <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/neuroscience/lymph-node">lymph node metastases</a></span>.</span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true">同时也对这些病人测了whole-exome sequencing (WES) and targeted <span class=""><a spellcheck="false" href="https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/genotyping">genotyping</a></span> (SNaPshot) data，但是这些数据公布在 <span spellcheck="false"><code>phs001474.v1.p1</code></span> ，不是很方便下载。</span></p>
<p><span class="md-line md-end-block" contenteditable="true">单细胞转录组建库用的<span spellcheck="false"><code>Smart-seq2</code></span>方法，所有的数据公布在 <span class=""><a spellcheck="false" href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103322">GSE103322</a></span> ， 仅仅是表达矩阵都有近100Mb了。</span></p>
<div class="CodeMirror cm-s-inner CodeMirror-wrap">
<div></div>
<div class="CodeMirror-scroll" tabindex="-1">
<div class="CodeMirror-sizer">
<div>
<div class="CodeMirror-lines">
<div>
<div class="CodeMirror-measure"></div>
<div class="CodeMirror-measure"></div>
<div></div>
<div class="CodeMirror-cursors"></div>
<div class="CodeMirror-code">
<div class="CodeMirror-activeline">
<div class="CodeMirror-activeline-background CodeMirror-linebackground"></div>
<div class="CodeMirror-gutter-background CodeMirror-activeline-gutter"></div>
<pre class=" CodeMirror-line ">GSE103322_HNSCC_all_data.txt.gz | 86.0 Mb |</pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div></div>
</div>
</div>
<p><span class="md-line md-end-block" contenteditable="true">下载地址是： <span class=""><a spellcheck="false" href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE103nnn/GSE103322/suppl/GSE103322%5FHNSCC%5Fall%5Fdata%2Etxt%2Egz">(ftp)</a></span><span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE103322&amp;format=file&amp;file=GSE103322%5FHNSCC%5Fall%5Fdata%2Etxt%2Egz">(http)</a></span> </span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula-3.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/rna-seq-cnv-formula-3.png" alt="" /></span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true">用CCLE数据做验证</h3>
<p><span class="md-line md-end-block" contenteditable="true">2014年关于GBM的science文章；PMID: <span class=""><a spellcheck="false" href="https://www.ncbi.nlm.nih.gov/pubmed/24925914">24925914</a></span><span class=""> ，文章提到：</span></span></p>
<blockquote><p><span class="md-line md-end-block" contenteditable="true">We downloaded the CCLE gene-centric RMA-normalized Affymetrix data (<span spellcheck="false"><a href="http://www.broadinstitute.org/ccle/">http://www.broadinstitute.org/ccle/</a></span>), and centered the expression of each gene across all cell lines at zero.</span></p></blockquote>
<p><span class="md-line md-end-block" contenteditable="true">需要简单注册后才能下载：<span spellcheck="false"><a href="https://portals.broadinstitute.org/ccle/users/sign_in">https://portals.broadinstitute.org/ccle/users/sign_in</a></span> </span></p>
<p><span class="md-line md-end-block" contenteditable="true">理论上要得到下面的图：</span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-image md-img-loaded" contenteditable="false" data-src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png"><img style="box-sizing: border-box; border-width: 0px 4px 0px 2px; border-right-style: solid; border-left-style: solid; border-right-color: transparent; border-left-color: transparent; vertical-align: middle; max-width: 100%; cursor: default;" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png" alt="" /></span>](<span spellcheck="false"><a href="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png">http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png</a></span><span class="">)</span></span></p>
<p><span class="md-line md-end-block" contenteditable="true"><span class="md-expand">说明使用转录组数据分析到的CNV情况和SNP6.0芯片的结果差异不大。</span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true">还有GTEx数据库的验证</h3>
<p><span class="md-line md-end-block" contenteditable="true">To compare these patterns to an external reference of normal cells we downloaded RNA-Seq data from the GTEX portal (<span spellcheck="false"><a href="http://www.gtexportal.org/">http://www.gtexportal.org/</a></span><span class="">; gene read counts file from Jan. 2013), and estimated CNV values as above: we normalized the read counts into log2(TPM+1), averaged all brain samples, restricted the data to the ~6,000 analyzed genes, subtracted for each gene the average normalized expression from the GBM single-cell data (this step is comparable to the centering of the single cell data) and then used a moving average of 100 genes over the genomically-ordered list of genes to define CNV-cont.</span></span></p>
<h3 class="md-end-block md-heading" contenteditable="true">总结</h3>
<p><span class="md-line md-end-block" contenteditable="true">上述文章及数据都是有表达矩阵可以下载，所以仅仅是根据这些文章的补充材料公布的公式即可重复整个流程啦。</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/3065.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用GISTIC多个segment文件来找SCNA变异</title>
		<link>http://www.bio-info-trainee.com/1648.html</link>
		<comments>http://www.bio-info-trainee.com/1648.html#comments</comments>
		<pubDate>Thu, 19 May 2016 12:13:36 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[cnv]]></category>
		<category><![CDATA[GISTIC]]></category>
		<category><![CDATA[somatic]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1648</guid>
		<description><![CDATA[这个软件在TCGA计划里面被频繁使用者，用这个软件的目的很简单，就是你研究了很多 &#8230; <a href="http://www.bio-info-trainee.com/1648.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这个软件在TCGA计划里面被频繁使用者，用这个软件的目的很简单，就是你研究了很多癌症样本，通过芯片得到了每个样本的拷贝数变化信息，芯片结果一般是segment结果，可以解释为CNV区域，需要用GISTIC把样本综合起来分析，寻找somatic的CNV，并且注释基因信息。</p>
<p>有两个难点，一是在linux下面安装matlab工作环境，二是如何制作输入文件。</p>
<p><span id="more-1648"></span></p>
<div><b>一、程序安装</b></div>
<div><b><b>安装指南：ftp://ftp.broadinstitute.org/pub/GISTIC2.0/INSTALL.txt</b></b></p>
<div>软件官网： <a href="http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&amp;paper_id=216&amp;p=t">http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&amp;paper_id=216&amp;p=t</a></div>
<div>paper ： <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218867/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218867/</a></div>
<div>下载：wget ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTIC_2_0_22.tar.gz</div>
<div>它的文档写的非常详细：ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm<br />
<b>解压之后，需要自己安装matlab编译环境，这个会很麻烦！</b></div>
</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/18.png"><img class="alignnone size-full wp-image-1649" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/18.png" alt="1" width="605" height="309" /></a></div>
<div>
<div><b><span style="color: #ff0000;">二、输入数据准备</span></b></div>
<div>用picnic或者birdseed等软件处理snp6.0芯片的raw data之后得到的segment文件</div>
<div>多个样本的segment合并起来作为输入数据，还有样本列表，芯片的一些信息，根据示例文件，很容易做出input文件！</div>
<div>arraylistfile就是你本次运行GISTIC软件所涉及到的所有样本，一般一个癌种一起运行。</div>
<div>cnvfiles可以不用。</div>
<div>segmentationfile.txt 就是你snp6.0等芯片运行得到的segment信息，把所有样本的结果合并在一起，一般一个样本的segment有1000千左右</div>
<div>markersfile.txt主要取决于你的芯片平台，如果是affymetrix的snp6.0芯片，会有90多万行数据，每个探针的信息都有。</div>
<div></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/23.png"><img class="alignnone size-full wp-image-1650" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/23.png" alt="2" width="420" height="92" /></a></div>
<div>软件自带的测试数据如上，可以看到是106个样本，总共是两万多segment信息，那么也就意味着平均每个样本才200个，可能是snp6.0芯片数据的PICNIC软件的结果。但是它的</p>
<div>
<div>markersfile.txt 明确写着才十多万mark，也就是探针，所以应该不是</p>
<div>snp6.0芯片</div>
</div>
</div>
</div>
<div>
<div>    106 arraylistfile.txt</div>
<div>  12942 cnvfile.txt</div>
<div> 115593 markersfile.txt</div>
<div>  20521 segmentationfile.txt</div>
</div>
<div>
<div></div>
<div><b><span style="color: #ff0000;">三、程序使用</span></b></div>
<div><span style="color: #ff0000;">软件提供的运行脚本使用的是csh，我修改成了bash</span></div>
<div><span style="color: #ff0000;">还需要修改matlab的路径及基因组版本信息</span></div>
</div>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/34.png"><img class="alignnone size-full wp-image-1651" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/34.png" alt="3" width="880" height="471" /></a></p>
</div>
<div>
<div>
<div><b><span style="color: #ff0000;">四、输出数据解读</span></b></div>
</div>
</div>
<div></div>
<p><span lang="ZH-CN">简单解释下输出的目录下的文件</span></p>
<p><span lang="EN-US">all_data_by_genes.txt </span>代表了基因（包括非编码<span lang="EN-US">RNA</span>如<span lang="EN-US">miRNA</span>，<span lang="EN-US">lncRNA</span>）在样本中具体的拷贝数值。</p>
<p><span lang="EN-US">all_lesions.conf_90.txt </span>代表识别的拷贝数扩增和缺失<span lang="EN-US">Peak</span>区域。</p>
<p><span lang="EN-US">all_thresholded.by_genes.txt </span>代表离散化之后的数值，如<span lang="EN-US">-2</span>代表丢失两个拷贝，<span lang="EN-US">-1</span>代表丢失一个拷贝<span lang="EN-US">,0</span>代表拷贝数正常<span lang="EN-US">,1</span>代表增加一个拷贝，<span lang="EN-US">2</span>代表扩增两个拷贝。</p>
<p><span lang="EN-US">broad_significance_results.txt</span>代表显著发生拷贝数变异的<span lang="EN-US">broad</span>区域。</p>
<p><span lang="EN-US">broad_values_by_arm.txt </span>代表染色体臂在样本中的拷贝数数值。</p>
<p><span lang="EN-US">scores.gistic</span>代表通过该方法打分之后的结果。</p>
<div><span style="color: #000000;"><strong>我写这个教程应该是2016年夏季了，现在已经是2017年秋季，这个软件又更新了，增加了对hg38版本的参考基因组数据进行处理，同时还把csh更改成了bash，真棒！</strong></span></div>
<div> 2.0.23 (2017-03-27) - The markers file input is now optional - if omitted, pseudo-markers will be<br />
generated to satisfy GISTIC's input requirements while ensuring reasonably<br />
uniform coverage of the genome.<br />
- The "broad analysis" of arm-level events has been revised:<br />
(1) arm-level events are now called from a single broad copy number profile<br />
instead of separate amplification and deletion profiles, which had led to<br />
arms counterintuitively called as amplified and deleted on the same sample;<br />
(2) the frequency scores used to determine z-scores and q-values, which excludes<br />
arms with the opposite call from the denominator, are now in a column called<br />
"frequency score". A new column called "frequncy" gives the intuitive frequency<br />
with the denominator inluding arms from all the samples. The analysis results<br />
for the same data will be different from that of previous GISTIC versions.<br />
- Error handling messages have been improved. In particular, many informative<br />
error messages were masked by an "Index exceeds matrix dimensions" error<br />
in the exception handler itself.<br />
- An hg38 reference genome is included with this release.<br />
- The gp_gistic2_from_seg binary executable is now compiled for MCR 8.3<br />
(Matlab R2014a). The source code is compatible with versions of Matlab up to<br />
R2016a, however, the appearance of output graphics may be altered for Matlab<br />
versions R2015a and later.<br />
- This release adds the convenient 'gistic2' wrapper function which sets up<br />
the MCR and passes its command line argument to the executable. Scripts have<br />
been converted from the C-shell to the Bourne shell.<br />
(END)</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1648.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>拷贝数变异检测芯片介绍</title>
		<link>http://www.bio-info-trainee.com/1295.html</link>
		<comments>http://www.bio-info-trainee.com/1295.html#comments</comments>
		<pubDate>Wed, 06 Jan 2016 01:00:08 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[cnv]]></category>
		<category><![CDATA[snp]]></category>
		<category><![CDATA[拷贝数]]></category>
		<category><![CDATA[芯片]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1295</guid>
		<description><![CDATA[这里的拷贝数变异检测芯片指的是Affymetrix Genome-Wide Hu &#8230; <a href="http://www.bio-info-trainee.com/1295.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>这里的拷贝数变异检测芯片指的是Affymetrix Genome-Wide Human SNP Array 6.0</p>
<div>cel数据，需要处理成segment及genotype数据</div>
</div>
<div>这个芯片在TCGA计划里面用的非常多，是标配了。大家只要记住，这是一个跟拷贝数变异检测相关的芯片，而且还可以测一些genotype <span class="Apple-converted-space"> </span></div>
<div>Affymetrix Genome-Wide Human SNP Array 6.0是唯一可以真正将CNP(拷贝数多态性)转化成高分辨率的参考图谱的平台。主要应用领域包括全基因组SNP分型、全基因组CNV分型、全基因组关联 分析、全基因组连锁分析。除了进行基因分型外，还为拷贝数研究和LOH研究提供帮助，从而能够进行：UPD检测、亲子鉴定、异常的亲代起源分析（针对 UPD和缺失）、纯合性分析、血缘关系鉴定。</div>
<div>参考：<a href="http://www.affymetrix.com/support/technical/byproduct.affx?product=genomewidesnp_6">http://www.affymetrix.com/support/technical/byproduct.affx?product=genomewidesnp_6</a></div>
<div></div>
<div>SNP Array 6.0是昂飞公司继Mapping10k、100k、500k和SNP5.0芯片后推出的新一代SNP芯片。在一张芯片上可以分析一个样本<b>906,600 个SNP的基因型</b>, 大约有482，000个SNP来自于前代产品500K和SNP5.0芯片。剩下424，000个SNP包括了来源于国际HapMap计划中的标签 SNP，X，Y染色体和线粒体上更具代表性的SNP,以及来自于重组热点区域和500K芯片设计完成后新加入dbSNP数据库的SNP。<b>该芯片同时含 946,000个非多态性CNV探针</b>，用于检测拷贝数变异，其中202,000个用于检测5677个已知拷贝数变异区域的探针，这些区域来源于多伦多基因 组变异体数据库。该数据库中每隔3,182个非重叠片段区域分别用61个探针来检测。除了检测这些已知的拷贝数多态区域，还有超过744,000个探针平 均分配到整个基因组上，用来发现未知的拷贝数变异区域。SNP和CNV两种探针高密度且均匀地分布在整个基因组<b>，作为拷贝数变异和杂合性缺失(LOH)检 测的工具来发现微小的染色体增加和缺失</b>。为广大生命科学研究者提高发现复杂疾病相关基因的可能提供了强有力的工具。<br />
通过与哈佛大学合办的Broad研究所合作，SNP6.0芯片在数据准确性和一致性方面达到了新的高度。相应推出的Genotyping Console用来处理SNP6.0芯片数据和全基因组遗传分析及质量控制。</div>
<div>
<p><strong>产品特点：</strong></p>
<p>1.涵盖超过1,800,000个遗传变异标志物：包括超过<b>906,600个SNP和超过946,000个用于检测拷贝数变化（CNV，Copy Number Variation）</b>的探针；</p>
<p>2.SNP和CNV两种探针高密度且均匀地分布在整个基因组，不仅可以用于SNP基因精确分型，还可用于拷贝数变异CNV的研究；</p>
<p>3.744,000个探针平均分配到整个基因组上，用来发现未知的拷贝数变异区域；</p>
<p>4.可用于Copy-neutral LOH/UPD检测，亲子鉴定，纯合性分析、血缘关系鉴定、遗传病或其它疾病的研究。</p>
<p>参考：<a href="http://www.biomart.cn/specials/cnv2014/article/84169">http://www.biomart.cn/specials/cnv2014/article/84169</a></div>
<div>在NCBI的GEO数据库里面可以查到这个芯片，已经有一万多个样本数据啦!</div>
<div>图中第一个是CCLE计划的近千个样本，可能是定制化了的snp6.0芯片吧</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard.png"><img class="alignnone size-full wp-image-1296" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard.png" alt="clipboard" width="1028" height="343" /></a></div>
<div>使用这个芯片数据来发文章的非常多，见列表：<a href="http://media.affymetrix.com/support/technical/other/snp6_array_publications.pdf">http://media.affymetrix.com/support/technical/other/snp6_array_publications.pdf</a></div>
<div>还有一篇2010-nature文章讲了如何用picnic来研究cnv，<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3145113/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3145113/</a></div>
<div>也有一篇2010年的文章提出了新的软件来分析这个芯片cnv数据<a href="http://bioinformatics.oxfordjournals.org/content/26/11/1395.long">http://bioinformatics.oxfordjournals.org/content/26/11/1395.long</a></div>
<div>实现同样功能的软件，非常之多，还有一个R的bioconductor系列的包</div>
<div><a href="http://www.bioconductor.org/help/search/index.html?q=cnv/">http://www.bioconductor.org/help/search/index.html?q=cnv/</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard2.png"><img class="alignnone size-full wp-image-1297" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard2.png" alt="clipboard2" width="710" height="602" /></a></div>
<div>随便进去都可以找到很多raw data，可以自己进行分析的！</div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/browse/?view=samples&amp;platform=6801">http://www.ncbi.nlm.nih.gov/geo/browse/?view=samples&amp;platform=6801</a></div>
<div>比如：<a href="ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1949nnn/GSM1949207/suppl/GSM1949207_SB_CID0102B_071708.CEL.gz">ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1949nnn/GSM1949207/suppl/GSM1949207%5FSB%5FCID0102B%5F071708%2ECEL%2Egz</a></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1295.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
