<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; SomaticSignatures</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/somaticsignatures/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用SomaticSignatures包来解析maf突变数据获得mutation signature</title>
		<link>http://www.bio-info-trainee.com/1623.html</link>
		<comments>http://www.bio-info-trainee.com/1623.html#comments</comments>
		<pubDate>Fri, 06 May 2016 12:26:19 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[mutation]]></category>
		<category><![CDATA[signature]]></category>
		<category><![CDATA[SomaticSignatures]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1623</guid>
		<description><![CDATA[mutation signature这个概念提出来还不久，我看了看文献，最早见于 &#8230; <a href="http://www.bio-info-trainee.com/1623.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>mutation signature这个概念提出来还不久，我看了看文献，最早见于2013年的一篇<a href="http://www.nature.com/nature/journal/v500/n7463/full/nature12477.html">nature文章</a>，主要是用来描述癌症患者的somatic mutation情况的。</p>
<p>首先要自己分析癌症样本数据，拿到somatic mutation，<a href="https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files">TCGA计划发展到现在已经有非常多的somatic mutation结果啦</a>，大家可以自行选择感兴趣的癌症数据拿来研究，解析一下mutation signature 。</p>
<p>我这里给大家推荐一个工具，是R语言的Bioconductor系列包中的一个，<a href="http://www.bioconductor.org/packages/3.3/bioc/vignettes/SomaticSignatures/inst/doc/SomaticSignatures-vignette.html">SomaticSignatures</a></p>
<p>其实它的说明书写的非常详细了已经，如果你理解了mutation signature的概念，很容易用那个包，其实你自己写一个脚本也是非常任意的，就是根据mutation的位置在基因组中找到它的前后一个碱基，然后组成三碱基突变模式，最后统计一下那96种突变模式的分布状况！</p>
<p>我这里简单讲一讲这个包如何用吧！</p>
<p>首先下载并加载几个必须的包：</p>
<div>library(SomaticSignatures)  ## 程序</div>
<div>library(SomaticCancerAlterations) ## 自带测试数据</div>
<div>library(BSgenome.Hsapiens.1000genomes.hs37d5)  ## 我们的参考基因组</div>
<div>library(VariantAnnotation)</div>
<div>## 这个对象很重要： GRanges class of the GenomicRanges package</div>
<div>
<div>##其中SomaticCancerAlterations这个包提供了测试数据，来自于8个不同癌症的外显子测序的项目。</div>
<div>sca_metadata = scaMetadata()</div>
<div>###可以查看关于这8个项目的介绍，每个项目都测了好几百个样本。但是我们只关心突变数据，而且只关心somatic的突变数据。</div>
<div>sca_data = unlist(scaLoadDatasets())</div>
</div>
<p>然后根据突变数据做好一个GRanges对象，这个可以看我以前的博客</p>
<div>sca_data$study = factor(gsub("(.*)_(.*)", "\\1", toupper(names(sca_data))))</div>
<div>sca_data = unname(subset(sca_data, Variant_Type %in% "SNP"))</div>
<div>sca_data = keepSeqlevels(sca_data, hsAutosomes())</div>
<div>## 这个对象就是我们软件的输入数据</div>
<div>sca_vr = VRanges(</div>
<div>    seqnames = seqnames(sca_data),</div>
<div>    ranges = ranges(sca_data),</div>
<div>    ref = sca_data$Reference_Allele,</div>
<div>    alt = sca_data$Tumor_Seq_Allele2,</div>
<div>    sampleNames = sca_data$Patient_ID,</div>
<div>    seqinfo = seqinfo(sca_data),</div>
<div>    study = sca_data$study</div>
<div>)</div>
<div>## 这里还可以直接用readVcf或者readMutect 来读取本地somatic mutation文件</div>
<div>## 提取突变数据，并且构造成一个Range对象。</div>
<div>sca_vr</div>
<div></div>
<div>
<div>###可以简单看看每个study都有多少somatic mutation</div>
<div>sort(table(sca_vr$study), decreasing = TRUE)</div>
<div>    LUAD   SKCM   HNSC   LUSC   KIRC    GBM   THCA     OV</div>
<div>   208724 200589  67125  61485  24158  19938   6716   5872</div>
<div>##用mutationContext函数来根据Range对象和下载好的参考基因组文件来获取突变的上下文信息。</div>
<div>sca_motifs = mutationContext(sca_vr, BSgenome.Hsapiens.1000genomes.hs37d5)</div>
<div>head(sca_motifs)</div>
<div>##可以看到Range对象，增加了两列：alteration        context</div>
<div></div>
<div>## 接下来根据做好的上下文突变数据矩阵来构建 the matrix MM of the form {motifs × studies}</div>
<div>sca_mm = motifMatrix(sca_motifs, group = "study", normalize = TRUE)</div>
<div>## 根据96种突变的频率，而不是次数来构造矩阵</div>
<div>head(round(sca_mm, 4))</div>
<div>## 然后直接画出每个study的Mutation spectrum 图</div>
<div>plotMutationSpectrum(sca_motifs, "study")</div>
<div> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/mutation-spectrum.png"><img class="alignnone wp-image-1625 size-medium" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/mutation-spectrum-260x260.png" alt="mutation spectrum" width="260" height="260" /></a></div>
<div>## 还要把spectrum分解成signature！！</div>
<div>## 这个包提供了两种方法，分别是NMF和PCA</div>
<div>n_sigs = 5</div>
<div>sigs_nmf = identifySignatures(sca_mm, n_sigs, nmfDecomposition)</div>
<div>sigs_pca = identifySignatures(sca_mm, n_sigs, pcaDecomposition)</div>
<div></div>
<div>##还提供了很多函数来探索：signatures, samples, observed and fitted.</div>
<div>需要我们掌握的是assessNumberSignatures，用来探索我们到底应该把ｓｐｅｃｔｒｕｍ分成多少个ｓｉｇｎａｔｕｒｅ</div>
<div>n_sigs = 2:8</div>
<div>gof_nmf = assessNumberSignatures(sca_mm, n_sigs, nReplicates = 5)</div>
<div>gof_pca = assessNumberSignatures(sca_mm, n_sigs, pcaDecomposition)</div>
<div>plotNumberSignatures(gof_nmf)　## 可视化展现</div>
<div></div>
<div>## 接下来可视化展现具体每个cancer type里面的各个个体在各个signature的占比</div>
<div>library(ggplot2)</div>
<div>plotSignatureMap(sigs_nmf) + ggtitle("Somatic Signatures: NMF - Heatmap")</div>
<div>plotSignatures(sigs_nmf) + ggtitle("Somatic Signatures: NMF - Barchart")</div>
<div>plotObservedSpectrum(sigs_nmf)</div>
<div>plotFittedSpectrum(sigs_nmf)</div>
<div>plotSampleMap(sigs_nmf)</div>
<div>plotSamples(sigs_nmf)</div>
<div></div>
<div>同理，PCA的结果也可以同样的可视化展现：</div>
<div>plotSignatureMap(sigs_pca) + ggtitle("Somatic Signatures: PCA - Heatmap")</div>
<div>plotSignatures(sigs_pca) + ggtitle("Somatic Signatures: PCA - Barchart")</div>
<div>plotFittedSpectrum(sigs_pca)</div>
<div>plotObservedSpectrum(sigs_pca)</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/mutation-signature-NMF.png"><img class="alignnone  wp-image-1624" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/mutation-signature-NMF.png" alt="mutation signature NMF" width="608" height="608" /></a></div>
<div>值得一提的是，所有的plot系列函数，都是基于ggplot的，所以可以继续深度定制化绘图细节。</div>
<div>p = plotSamples(sigs_nmf)</div>
<div></div>
<div>## (re)move the legend</div>
<div>p = p + theme(legend.position = "none")</div>
<div>## (re)label the axis</div>
<div>p = p + xlab("Studies")</div>
<div>## add a title</div>
<div>p = p + ggtitle("Somatic Signatures in TGCA WES Data")</div>
<div>## change the color scale</div>
<div>p = p + scale_fill_brewer(palette = "Blues")</div>
<div>## decrease the size of x-axis labels</div>
<div>p = p + theme(axis.text.x = element_text(size = 9))</div>
<div></div>
<div>###当然，对上下文突变数据矩阵也可以进行聚类分析</div>
<div>clu_motif = clusterSpectrum(sca_mm, "motif")</div>
<div>library(ggdendro)</div>
<div>p = ggdendrogram(clu_motif, rotate = TRUE)</div>
<div>p</div>
<div></div>
<div></div>
<div>## 最后，由于我们综合了8个不同的study，所以必然会有批次影响，如果可以，也需要去除。</div>
</div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1623.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
