<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; samr</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/samr/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用samr包对芯片数据做差异分析</title>
		<link>http://www.bio-info-trainee.com/1608.html</link>
		<comments>http://www.bio-info-trainee.com/1608.html#comments</comments>
		<pubDate>Thu, 05 May 2016 11:43:04 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[limma]]></category>
		<category><![CDATA[samr]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1608</guid>
		<description><![CDATA[本来搞差异分析的工具和包就一大堆了，而且limma那个包已经非常完善了，我是不准 &#8230; <a href="http://www.bio-info-trainee.com/1608.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<blockquote><p>本来搞差异分析的工具和包就一大堆了，而且limma那个包已经非常完善了，我是不准备再讲这个的，正好有个同学问了一下这个包，我就随手测试了一下，顺便看看它跟limma有什么差异没有！手痒了就记录了测试流程！</p></blockquote>
<blockquote><p>学习一个包其实非常简单，就是找到包的官网看看说明书即可！<a href="https://cran.r-project.org/web/packages/samr/samr.pdf">说明书链接</a></p>
<p>&nbsp;</p></blockquote>
<p><span id="more-1608"></span></p>
<p>samr这个包更简单，就一个函数<strong>SAM</strong>,但是根据分析数据的不同被包装成了两个函数，分别是处理高通量测序数据的<strong>SAMseq</strong>和处理芯片数据的<strong>samr</strong>,本次我只讲解芯片数据的处理，然后跟limma这个包做一个简单比较~</p>
<p>所以，我们只需要制作好数据，然后学会用samr这个函数即可！</p>
<p>我们还是利用CLL这个包的测试数据来讲解这个包的用法，首先也是制作表达矩阵和分组信息。</p>
<blockquote>
<pre class="r"><code class="r"><span class="identifier">suppressPackageStartupMessages</span><span class="paren">(</span><span class="keyword">library</span><span class="paren">(</span><span class="identifier">CLL</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">data</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>
<span class="identifier">exprSet</span><span class="operator">=</span><span class="identifier">exprs</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>   <span class="comment">##sCLLex是依赖于CLL这个package的一个对象</span>
<span class="identifier">samples</span><span class="operator">=</span><span class="identifier">sampleNames</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>
<span class="identifier">pdata</span><span class="operator">=</span><span class="identifier">pData</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>
<span class="identifier">group_list</span><span class="operator">=</span><span class="identifier">as.character</span><span class="paren">(</span><span class="identifier">pdata</span><span class="paren">[</span>,<span class="number">2</span><span class="paren">]</span><span class="paren">)</span>
<span class="identifier">group_list</span></code></pre>
<pre><code>##  [1] "progres." "stable"   "progres." "progres." "progres." "progres."
##  [7] "stable"   "stable"   "progres." "stable"   "progres." "stable"  
## [13] "progres." "stable"   "stable"   "progres." "progres." "progres."
## [19] "progres." "progres." "progres." "stable"</code></pre>
<pre class="r"><code class="r"><span class="identifier">as.numeric</span><span class="paren">(</span><span class="identifier">as.factor</span><span class="paren">(</span><span class="identifier">group_list</span><span class="paren">)</span><span class="paren">)</span></code></pre>
<pre><code>##  [1] 1 2 1 1 1 1 2 2 1 2 1 2 1 2 2 1 1 1 1 1 1 2</code></pre>
</blockquote>
<p>这个表达矩阵exprSet和分组信息group_list就可以直接用来做差异分析啦~！ 它的分组信息要求比较读取，需要1,1,1,2,2,2这样的向量，所以我用了as.numeric(as.factor(group_list))，具体见下面的代码！</p>
<blockquote>
<pre class="r"><code class="r"><span class="identifier">suppressPackageStartupMessages</span><span class="paren">(</span><span class="keyword">library</span><span class="paren">(</span><span class="identifier">samr</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">data</span><span class="operator">=</span><span class="identifier">list</span><span class="paren">(</span><span class="identifier">x</span><span class="operator">=</span><span class="identifier">exprSet</span>,<span class="identifier">y</span><span class="operator">=</span><span class="identifier">as.numeric</span><span class="paren">(</span><span class="identifier">as.factor</span><span class="paren">(</span><span class="identifier">group_list</span><span class="paren">)</span><span class="paren">)</span>, 
          <span class="identifier">geneid</span><span class="operator">=</span><span class="identifier">as.character</span><span class="paren">(</span><span class="number">1</span><span class="operator">:</span><span class="identifier">nrow</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span><span class="paren">)</span>,
          <span class="identifier">genenames</span><span class="operator">=</span><span class="identifier">rownames</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span>, 
          <span class="identifier">logged2</span><span class="operator">=</span><span class="literal">TRUE</span>
<span class="paren">)</span>
<span class="identifier">samr.obj</span><span class="operator">&lt;-</span><span class="identifier">samr</span><span class="paren">(</span><span class="identifier">data</span>, <span class="identifier">resp.type</span><span class="operator">=</span><span class="string">"Two class unpaired"</span>, <span class="identifier">nperms</span><span class="operator">=</span><span class="number">100</span><span class="paren">)</span></code></pre>
</blockquote>
<p>这样其实已经OK啦，重点是如何调整这个函数的参数，以及如何理解这个函数返回的结果(samr.obj这个对象非常重要，关乎你能否真正用好samr)~</p>
<p>我这里的genenames其实是探针名，如果真正要做分析，可以修改，而且我的nperms次数为100，也可以修改，一般是1000.</p>
<p>除了直接应用它找差异基因外，它还有几个单独的函数</p>
<p>首先是对表达矩阵进行normalization</p>
<blockquote>
<pre class="r"><code class="r"><span class="identifier">x.norm</span> <span class="operator">&lt;-</span> <span class="identifier">samr.norm.data</span><span class="paren">(</span><span class="identifier">data</span><span class="operator">$</span><span class="identifier">x</span><span class="paren">)</span>
<span class="identifier">par</span><span class="paren">(</span><span class="identifier">mfrow</span><span class="operator">=</span><span class="identifier">c</span><span class="paren">(</span><span class="number">1</span>,<span class="number">2</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">boxplot</span><span class="paren">(</span><span class="identifier">exprSet</span>, <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">rainbow</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span>,<span class="identifier">main</span><span class="operator">=</span><span class="string">"before normalization"</span>,<span class="identifier">las</span><span class="operator">=</span><span class="number">2</span><span class="paren">)</span>
<span class="identifier">boxplot</span><span class="paren">(</span><span class="identifier">x.norm</span>,  <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">rainbow</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span>,<span class="identifier">main</span><span class="operator">=</span><span class="string">"after normalization"</span>,<span class="identifier">las</span><span class="operator">=</span><span class="number">2</span><span class="paren">)
<a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/QQ截图20160505194154.png"><img class="alignnone size-full wp-image-1609" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/QQ截图20160505194154.png" alt="QQ截图20160505194154" width="720" height="503" /></a>
</span></code></pre>
</blockquote>
<p>&nbsp;</p>
<p>看图好像没什么区别</p>
<p>另外几个函数，我就不一一介绍了，大家可以自行探索。</p>
<p>* samr.plot(samr.obj, del, min.foldchange=0)</p>
<p>* samr.plot(samr.obj, del=.3)</p>
<p>* samr.assess.samplesize.obj&lt;- samr.assess.samplesize(samr.obj, data, log2(1.5))</p>
<p>* samr.assess.samplesize.plot(samr.assess.samplesize.obj)</p>
<p>我们重点看看这个samr得到的差异与limma的差异区别在哪里</p>
<blockquote>
<pre class="r"><code class="r"><span class="comment">## 首先提取samr做差异分析检验的p值</span>
<span class="identifier">pv</span><span class="operator">=</span><span class="identifier">samr.pvalues.from.perms</span><span class="paren">(</span><span class="identifier">samr.obj</span><span class="operator">$</span><span class="identifier">tt</span>, <span class="identifier">samr.obj</span><span class="operator">$</span><span class="identifier">ttstar</span><span class="paren">)</span>
<span class="comment">## 然后提取limma包做差异分析检验的p值</span>
<span class="keyword">library</span><span class="paren">(</span><span class="identifier">limma</span><span class="paren">)</span> 
<span class="identifier">design</span><span class="operator">=</span><span class="identifier">model.matrix</span><span class="paren">(</span><span class="operator">~</span><span class="identifier">factor</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="operator">$</span><span class="identifier">Disease</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">fit</span><span class="operator">=</span><span class="identifier">lmFit</span><span class="paren">(</span><span class="identifier">sCLLex</span>,<span class="identifier">design</span><span class="paren">)</span>
<span class="identifier">fit</span><span class="operator">=</span><span class="identifier">eBayes</span><span class="paren">(</span><span class="identifier">fit</span><span class="paren">)</span>
<span class="identifier">options</span><span class="paren">(</span><span class="identifier">digits</span> <span class="operator">=</span> <span class="number">4</span><span class="paren">)</span>
<span class="identifier">DEG_limma</span><span class="operator">=</span><span class="identifier">topTable</span><span class="paren">(</span><span class="identifier">fit</span>,<span class="identifier">coef</span><span class="operator">=</span><span class="number">2</span>,<span class="identifier">adjust</span><span class="operator">=</span><span class="string">'BH'</span>,<span class="identifier">n</span><span class="operator">=</span><span class="literal">Inf</span><span class="paren">)</span> 
<span class="identifier">pv_limma</span><span class="operator">=</span><span class="identifier">DEG_limma</span><span class="operator">$</span><span class="identifier">P.Value</span>
<span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">)</span><span class="operator">=</span><span class="identifier">rownames</span><span class="paren">(</span><span class="identifier">DEG_limma</span><span class="paren">)</span>
<span class="identifier">head</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span><span class="paren">)</span></code></pre>
<pre><code>##  100_g_at   1000_at   1001_at 1002_f_at 1003_s_at   1004_at 
##    0.2531    0.4144    0.5671    0.5686    0.4687    0.6340</code></pre>
<pre class="r"><code class="r"><span class="identifier">head</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span><span class="paren">)</span></code></pre>
<pre><code>##  100_g_at   1000_at   1001_at 1002_f_at 1003_s_at   1004_at 
##    0.2497    0.4312    0.5349    0.5498    0.4361    0.6473</code></pre>
<pre class="r"><code class="r"><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span>,<span class="identifier">pv_limma</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span><span class="paren">)</span></code></pre>
<pre><code>## [1] 0.9976</code></pre>
</blockquote>
<p>从数据上来看，没什么本质区别,而且相关系数高达0.9978.</p>
<p>所以结论是，没必要搞那么多的包，用limma就好了，甚至直接用t检验也是OK的</p>
<p>还有plot和summary也是可以直接作用于samr的结果samr.obj对象的</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1608.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
