<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; eset</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/eset/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>ExpressionSet 对象简单讲解</title>
		<link>http://www.bio-info-trainee.com/1510.html</link>
		<comments>http://www.bio-info-trainee.com/1510.html#comments</comments>
		<pubDate>Sat, 09 Apr 2016 02:06:10 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信基础]]></category>
		<category><![CDATA[biobase]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[eset]]></category>
		<category><![CDATA[expressionset]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1510</guid>
		<description><![CDATA[这是我们bioconductor中文社区的一个简单测试 好像放在博客里面mark &#8230; <a href="http://www.bio-info-trainee.com/1510.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h1><code>这是我们bioconductor中文社区的一个简单测试</code></h1>
<pre>好像放在博客里面markdown的语法除了问题，欢迎直接去<a href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md">github查看</a></pre>
<blockquote><p>这个对象其实是对表达矩阵加上样本分组信息的一个封装，由biobase这个包引入。它是eSet这个对象的继承。</p></blockquote>
<h2><a id="user-content-一个现成例子" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md#%E4%B8%80%E4%B8%AA%E7%8E%B0%E6%88%90%E4%BE%8B%E5%AD%90"></a>一个现成例子</h2>
<blockquote><p>下面是一个具体的例子，来源于CLL这个包，是用hgu95av2芯片测了22个样本</p></blockquote>
<div class="highlight highlight-source-r">
<blockquote>
<pre>    <span class="pl-k">&gt;</span> library(<span class="pl-smi">CLL</span>)
    <span class="pl-k">&gt;</span> data(<span class="pl-smi">sCLLex</span>)
    <span class="pl-k">&gt;</span> <span class="pl-smi">sCLLex</span>
    ExpressionSet (<span class="pl-smi">storageMode</span><span class="pl-k">:</span> <span class="pl-smi">lockedEnvironment</span>)
    <span class="pl-smi">assayData</span><span class="pl-k">:</span> <span class="pl-c1">12625</span> <span class="pl-smi">features</span>, <span class="pl-c1">22</span> <span class="pl-smi">samples</span>  <span class="pl-c">##表达矩阵</span>
      <span class="pl-smi">element</span> <span class="pl-smi">names</span><span class="pl-k">:</span> <span class="pl-smi">exprs</span> 
    <span class="pl-smi">protocolData</span><span class="pl-k">:</span> <span class="pl-smi">none</span>
    <span class="pl-smi">phenoData</span>
      <span class="pl-smi">sampleNames</span><span class="pl-k">:</span> <span class="pl-smi">CLL11.CEL</span> <span class="pl-smi">CLL12.CEL</span> <span class="pl-k">...</span> CLL9.CEL (<span class="pl-c1">22</span> <span class="pl-smi">total</span>)
      <span class="pl-smi">varLabels</span><span class="pl-k">:</span> <span class="pl-smi">SampleID</span> <span class="pl-smi">Disease</span>   <span class="pl-c">## 样本分组信息</span>
      <span class="pl-smi">varMetadata</span><span class="pl-k">:</span> <span class="pl-smi">labelDescription</span>
    <span class="pl-smi">featureData</span><span class="pl-k">:</span> <span class="pl-smi">none</span>
    <span class="pl-smi">experimentData</span><span class="pl-k">:</span> <span class="pl-smi">use</span> <span class="pl-s"><span class="pl-pds">'</span>experimentData(object)<span class="pl-pds">'</span></span>
    <span class="pl-smi">Annotation</span><span class="pl-k">:</span> <span class="pl-smi">hgu95av2</span> 
    <span class="pl-k">&gt;</span> <span class="pl-v">exprMatrix</span><span class="pl-k">=</span>exprs(<span class="pl-smi">sCLLex</span>)
    <span class="pl-k">&gt;</span> dim(<span class="pl-smi">exprMatrix</span>)
    [<span class="pl-c1">1</span>] <span class="pl-c1">12625</span>    <span class="pl-c1">22</span>
    <span class="pl-k">&gt;</span> <span class="pl-v">meta</span><span class="pl-k">=</span>pData(<span class="pl-smi">sCLLex</span>)
    <span class="pl-k">&gt;</span> table(<span class="pl-smi">meta</span><span class="pl-k">$</span><span class="pl-smi">Disease</span>)

    <span class="pl-smi">progres</span>.   <span class="pl-smi">stable</span> 
          <span class="pl-c1">14</span>        <span class="pl-c1">8</span> 
    <span class="pl-k">&gt;</span></pre>
</blockquote>
</div>
<pre><code>根据上面的信息可以看出该芯片共12625个探针，这22个样本根据疾病状态分成两组，14vs8
这个数据对象就可以打包做很多包的分析输入数据。
对这个包的分析，重点就是 `exprs` 函数提取表达矩阵，`pData` 函数看看该对象的样本分组信息。
</code></pre>
<h2><a id="user-content-limma等包使用该对象作为输入数据" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md#limma%E7%AD%89%E5%8C%85%E4%BD%BF%E7%94%A8%E8%AF%A5%E5%AF%B9%E8%B1%A1%E4%BD%9C%E4%B8%BA%E8%BE%93%E5%85%A5%E6%95%B0%E6%8D%AE"></a>limma等包使用该对象作为输入数据</h2>
<blockquote><p>下面这个例子充分说明了 <code>ExpressionSet</code> 对象的重要性</p></blockquote>
<div class="highlight highlight-source-r">
<blockquote>
<pre>    <span class="pl-k">&gt;</span> library(<span class="pl-smi">limma</span>)
    <span class="pl-k">&gt;</span> <span class="pl-v">design</span><span class="pl-k">=</span>model.matrix(<span class="pl-k">~</span><span class="pl-k">factor</span>(<span class="pl-smi">sCLLex</span><span class="pl-k">$</span><span class="pl-smi">Disease</span>))
    <span class="pl-k">&gt;</span> <span class="pl-v">fit</span><span class="pl-k">=</span>lmFit(<span class="pl-smi">sCLLex</span>,<span class="pl-smi">design</span>)
    <span class="pl-k">&gt;</span> <span class="pl-v">fit</span><span class="pl-k">=</span>eBayes(<span class="pl-smi">fit</span>)
    <span class="pl-k">&gt;</span> options(<span class="pl-v">digits</span> <span class="pl-k">=</span> <span class="pl-c1">4</span>)
    <span class="pl-k">&gt;</span> topTable(<span class="pl-smi">fit</span>,<span class="pl-v">coef</span><span class="pl-k">=</span><span class="pl-c1">2</span>,<span class="pl-v">adjust</span><span class="pl-k">=</span><span class="pl-s"><span class="pl-pds">'</span>BH<span class="pl-pds">'</span></span>)
               <span class="pl-smi">logFC</span> <span class="pl-smi">AveExpr</span>      <span class="pl-smi">t</span>   <span class="pl-smi">P.Value</span> <span class="pl-smi">adj.P.Val</span>     <span class="pl-smi">B</span>
    <span class="pl-ii">39400_at</span>  <span class="pl-c1">1.0285</span>   <span class="pl-c1">5.621</span>  <span class="pl-c1">5.836</span> <span class="pl-c1">8.341e-06</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.234</span>
    <span class="pl-ii">36131_at</span> <span class="pl-k">-</span><span class="pl-c1">0.9888</span>   <span class="pl-c1">9.954</span> <span class="pl-k">-</span><span class="pl-c1">5.772</span> <span class="pl-c1">9.668e-06</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.117</span>
    <span class="pl-ii">33791_at</span> <span class="pl-k">-</span><span class="pl-c1">1.8302</span>   <span class="pl-c1">6.951</span> <span class="pl-k">-</span><span class="pl-c1">5.736</span> <span class="pl-c1">1.049e-05</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.052</span>
    <span class="pl-ii">1303_at</span>   <span class="pl-c1">1.3836</span>   <span class="pl-c1">4.463</span>  <span class="pl-c1">5.732</span> <span class="pl-c1">1.060e-05</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.044</span>
    <span class="pl-ii">36122_at</span> <span class="pl-k">-</span><span class="pl-c1">0.7801</span>   <span class="pl-c1">7.260</span> <span class="pl-k">-</span><span class="pl-c1">5.141</span> <span class="pl-c1">4.206e-05</span>   <span class="pl-c1">0.10619</span> <span class="pl-c1">1.935</span>
    <span class="pl-ii">36939_at</span> <span class="pl-k">-</span><span class="pl-c1">2.5472</span>   <span class="pl-c1">6.915</span> <span class="pl-k">-</span><span class="pl-c1">5.038</span> <span class="pl-c1">5.362e-05</span>   <span class="pl-c1">0.11283</span> <span class="pl-c1">1.737</span>
    <span class="pl-ii">41398_at</span>  <span class="pl-c1">0.5187</span>   <span class="pl-c1">7.602</span>  <span class="pl-c1">4.879</span> <span class="pl-c1">7.824e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.428</span>
    <span class="pl-ii">32599_at</span>  <span class="pl-c1">0.8544</span>   <span class="pl-c1">5.746</span>  <span class="pl-c1">4.859</span> <span class="pl-c1">8.207e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.389</span>
    <span class="pl-ii">36129_at</span>  <span class="pl-c1">0.9161</span>   <span class="pl-c1">8.209</span>  <span class="pl-c1">4.859</span> <span class="pl-c1">8.212e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.389</span>
    <span class="pl-ii">37636_at</span> <span class="pl-k">-</span><span class="pl-c1">1.6868</span>   <span class="pl-c1">5.697</span> <span class="pl-k">-</span><span class="pl-c1">4.804</span> <span class="pl-c1">9.355e-05</span>   <span class="pl-c1">0.11811</span> <span class="pl-c1">1.282</span>
    <span class="pl-k">&gt;</span></pre>
</blockquote>
</div>
<p>还有非常多的其它包会使用 <code>ExpressionSet</code> 对象，我就不一一介绍了。</p>
<h2><a id="user-content-自己构造-expressionset-对象" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md#%E8%87%AA%E5%B7%B1%E6%9E%84%E9%80%A0-expressionset-%E5%AF%B9%E8%B1%A1"></a>自己构造 <code>ExpressionSet</code> 对象</h2>
<blockquote><p>根据上面的讲解，我们知道了在这个对象其实很简单，就是对表达矩阵加上样本分组信息的一个封装。 所以我们就用上面得到的exprMatrix和meta来构建一个ExpressionSet对象，biobase包里面提供了详细的说明,建议大家仔细看官方手册</p></blockquote>
<div class="highlight highlight-source-r">
<blockquote>
<pre>    <span class="pl-smi">metadata</span> <span class="pl-k">&lt;-</span> <span class="pl-k">data.frame</span>(<span class="pl-v">labelDescription</span><span class="pl-k">=</span>c(<span class="pl-s"><span class="pl-pds">'</span>SampleID<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>Disease<span class="pl-pds">'</span></span>),
                       <span class="pl-v">row.names</span><span class="pl-k">=</span>c(<span class="pl-s"><span class="pl-pds">'</span>SampleID<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>Disease<span class="pl-pds">'</span></span>))
    <span class="pl-smi">phenoData</span> <span class="pl-k">&lt;-</span> new(<span class="pl-s"><span class="pl-pds">"</span>AnnotatedDataFrame<span class="pl-pds">"</span></span>,<span class="pl-v">data</span><span class="pl-k">=</span><span class="pl-smi">meta</span>,<span class="pl-v">varMetadata</span><span class="pl-k">=</span><span class="pl-smi">metadata</span>)
    <span class="pl-smi">myExpressionSet</span> <span class="pl-k">&lt;-</span> ExpressionSet(<span class="pl-v">assayData</span><span class="pl-k">=</span><span class="pl-smi">exprMatrix</span>,
                                     <span class="pl-v">phenoData</span><span class="pl-k">=</span><span class="pl-smi">phenoData</span>,
                                     <span class="pl-v">annotation</span><span class="pl-k">=</span><span class="pl-s"><span class="pl-pds">"</span>hgu95av2<span class="pl-pds">"</span></span>)
    <span class="pl-k">&gt;</span> <span class="pl-smi">myExpressionSet</span>
    ExpressionSet (<span class="pl-smi">storageMode</span><span class="pl-k">:</span> <span class="pl-smi">lockedEnvironment</span>)
    <span class="pl-smi">assayData</span><span class="pl-k">:</span> <span class="pl-c1">12625</span> <span class="pl-smi">features</span>, <span class="pl-c1">22</span> <span class="pl-smi">samples</span> 
      <span class="pl-smi">element</span> <span class="pl-smi">names</span><span class="pl-k">:</span> <span class="pl-smi">exprs</span> 
    <span class="pl-smi">protocolData</span><span class="pl-k">:</span> <span class="pl-smi">none</span>
    <span class="pl-smi">phenoData</span>
      <span class="pl-smi">sampleNames</span><span class="pl-k">:</span> <span class="pl-smi">CLL11.CEL</span> <span class="pl-smi">CLL12.CEL</span> <span class="pl-k">...</span> CLL9.CEL (<span class="pl-c1">22</span> <span class="pl-smi">total</span>)
      <span class="pl-smi">varLabels</span><span class="pl-k">:</span> <span class="pl-smi">SampleID</span> <span class="pl-smi">Disease</span>
      <span class="pl-smi">varMetadata</span><span class="pl-k">:</span> <span class="pl-smi">labelDescription</span>
    <span class="pl-smi">featureData</span><span class="pl-k">:</span> <span class="pl-smi">none</span>
    <span class="pl-smi">experimentData</span><span class="pl-k">:</span> <span class="pl-smi">use</span> <span class="pl-s"><span class="pl-pds">'</span>experimentData(object)<span class="pl-pds">'</span></span>
    <span class="pl-smi">Annotation</span><span class="pl-k">:</span> <span class="pl-smi">hgu95av2</span> 
    <span class="pl-k">&gt;</span></pre>
</blockquote>
</div>
<blockquote><p>从上面的构造过程可以看出，重点就是表达矩阵加上样本分组信息</p></blockquote>
<h2><a id="user-content-其它例子" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md#%E5%85%B6%E5%AE%83%E4%BE%8B%E5%AD%90"></a>其它例子</h2>
<h3><a id="user-content-all包的数据自带-expressionset-对象" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md#all%E5%8C%85%E7%9A%84%E6%95%B0%E6%8D%AE%E8%87%AA%E5%B8%A6-expressionset-%E5%AF%B9%E8%B1%A1"></a>ALL包的数据自带 <code>ExpressionSet</code> 对象</h3>
<div class="highlight highlight-source-r">
<pre>    library(<span class="pl-smi">ALL</span>)
    data(<span class="pl-smi">ALL</span>)
    <span class="pl-smi">ALL</span>

    ExpressionSet (<span class="pl-smi">storageMode</span><span class="pl-k">:</span> <span class="pl-smi">lockedEnvironment</span>)
    <span class="pl-smi">assayData</span><span class="pl-k">:</span> <span class="pl-c1">12625</span> <span class="pl-smi">features</span>, <span class="pl-c1">128</span> <span class="pl-smi">samples</span>
        <span class="pl-smi">element</span> <span class="pl-smi">names</span><span class="pl-k">:</span> <span class="pl-smi">exprs</span>
    <span class="pl-smi">protocolData</span><span class="pl-k">:</span> <span class="pl-smi">none</span>
    <span class="pl-smi">phenoData</span>
        <span class="pl-smi">sampleNames</span><span class="pl-k">:</span> <span class="pl-c1">01005</span> <span class="pl-c1">01010</span> … LAL4 (<span class="pl-c1">128</span> <span class="pl-smi">total</span>)
        <span class="pl-smi">varLabels</span><span class="pl-k">:</span> <span class="pl-smi">cod</span> <span class="pl-smi">diagnosis</span> … <span class="pl-smi">date</span> <span class="pl-smi">last</span> seen (<span class="pl-c1">21</span> <span class="pl-smi">total</span>)
        <span class="pl-smi">varMetadata</span><span class="pl-k">:</span> <span class="pl-smi">labelDescription</span>
    <span class="pl-smi">featureData</span><span class="pl-k">:</span> <span class="pl-smi">none</span>
    <span class="pl-smi">experimentData</span><span class="pl-k">:</span> <span class="pl-smi">use</span> ‘experimentData(<span class="pl-smi">object</span>)’
    <span class="pl-smi">pubMedIds</span><span class="pl-k">:</span> <span class="pl-c1">14684422</span> <span class="pl-c1">16243790</span> 
    <span class="pl-smi">Annotation</span><span class="pl-k">:</span> <span class="pl-smi">hgu95av2</span></pre>
</div>
<p>这个数据非常出名，很多其它算法包都会拿这个数据来举例子，只有真正理解了ExpressionSet对象才能学会bioconductor系列包</p>
<h2><a id="user-content-用geoquery包来下载得到-expressionset-对象" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/ExpressionSet.md#%E7%94%A8geoquery%E5%8C%85%E6%9D%A5%E4%B8%8B%E8%BD%BD%E5%BE%97%E5%88%B0-expressionset-%E5%AF%B9%E8%B1%A1"></a>用GEOquery包来下载得到 <code>ExpressionSet</code> 对象</h2>
<div class="highlight highlight-source-r">
<pre>    <span class="pl-v">gse1009</span><span class="pl-k">=</span><span class="pl-e">GEOquery</span><span class="pl-k">::</span>getGEO(<span class="pl-s"><span class="pl-pds">"</span>GSE1009<span class="pl-pds">"</span></span>)
    <span class="pl-smi">gse1009</span>[[<span class="pl-c1">1</span>]] <span class="pl-c">## 这就是ExpressionSet对象


我发现糗世界讲的要比我好：http://blog.qiubio.com:8080/archives/2957

</span></pre>
<p>在Biobase基础包中，ExpressionSet是非常重要的类，因为Bioconductor设计之初是为了对基因芯片数据进行分析，而ExpressionSet正是Bioconductor为基因表达数据格式所定制的标准。它是所有涉及基因表达量相关数据在Bioconductor中进行操作的基础数据类型，比如affyPLM, affy, oligo, limma, arrayMagic等等。所以当我们学习Bioconductor时，第一个任务就是了解并掌握ExpressionSet的一切。</p>
<p>ExpressionSet的组成：</p>
<ul>
<li><i></i>assayData: 一个matrix类型或者environment类型数据。用于保存表达数据值。<br />
当它是一个matrix时，它的行表示不同的探针组（probe sets）（也是features，总之是一个无重复的索引值）的值，它的列表示不同的样品。如果有行号或者列号的话，那么行号必须与featureData及phenoData中的行号一致，列号就是样品名。当我们使用exprs()方法时，就是调取的这个assayData的matrix。<br />
当它是一个enviroment时，它必须有两个变量，一个就是与上一段描述一致的matrix，另一个就是epxrs，而这个exprs会响应exprs()方法，返回表达值。</li>
<li><i></i>头文件：用于描述实验平台相关的数据，其中包括phenoData, featureData，protocolData以及annotation等等。其中<br />
phenoData是一个存放样品信息的data.frame或者AnnotatedDataFrame类型的数据。如果有行号的话，其行号必须与assayData的列号一致（也就是样品名）。如果没有行号，则其行数必须与assayData的列数一致。<br />
featureData是一个存放features的data.frame或者AnnotatedDataFrame类型的数据。它的行数必须与assayData的行数一致。如果有行号的话，那么它的行号必须和assayData的行号一致。<br />
annotation是用于存放芯片类型的字符串，比如hgu95av2之类。<br />
protocolData用于存放设备相当的数据。它是AnnotatedDataFrame类型。它的维度必须与assayData的维度一致。</li>
<li><i></i>experimentData: 一个MIAME类型的数据，它用于保存和实验设计相关的资料，比如实验室名，发表的文章，等等。那么什么是MIAME类呢？MIAME是Minimum Information About a Microarray Experiment的首字母缩写，它包括以下一些属性（slots）：
<ol>
<li><i></i>name: 字符串，实验名称</li>
<li><i></i>lab: 字符串，实验室名称</li>
<li><i></i>contact: 字符串，联系方式</li>
<li><i></i>title: 字符串，一句话描述实验的内容</li>
<li><i></i>abstract: 字符串，实验摘要</li>
<li><i></i>url: 字符串，实验相关的网址</li>
<li><i></i>samples: list类，样品的信息</li>
<li><i></i>hybridizations: list类，杂交的信息</li>
<li><i></i>normControls: list类，对照信息，比如一些持家基因（house keeping genes）</li>
<li><i></i>preprocessing: list类，原始数据的预处理过程</li>
<li><i></i>pubMedIds: 字符串，pubMed索引号</li>
<li><i></i>others: list类，其它相关的信息</li>
</ol>
<p>有了这些，所有实验相关的信息基本全备。</li>
</ul>
<p>ExpressionSet继承了eSet类，属性基本和eSet保持一致。</p>
<p>那么，对于一个ExpressionSet，哪些属性是必须的？哪些有可能缺失呢？很显然，assayData是必须的，其它的可能会缺失，但是不能都缺失，因为那样的话就无法完成数据分析的工作。</p>
<pre> 

<span style="color: #ff0000;"><strong>对于ExpressionSet最重要的操作就是如何取出子集了。</strong></span>有时候在进行质量分析之后，我们对其中一些样品的数据不满意，想从已经实例化的ExpressionSet中抽取掉，或者我们希望对样品进行分组，都需要使用到Subset的概念。那么如何抽取子集呢？
我们可以象操作矩阵那样对其进行子集操作：vv &lt;- exampleSet[1:5, 1:3]</pre>
<pre>使用它的一些属性来对其进行子集操作：males &lt;- exampleSet[, exampleSet$gender == "Male"];</pre>
<pre>








</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1510.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
