<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 差异分析</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e5%b7%ae%e5%bc%82%e5%88%86%e6%9e%90/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>自学miRNA-seq分析第七讲~miRNA样本配对mRNA表达量获取</title>
		<link>http://www.bio-info-trainee.com/1716.html</link>
		<comments>http://www.bio-info-trainee.com/1716.html#comments</comments>
		<pubDate>Fri, 01 Jul 2016 15:57:59 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[hgu133plus2]]></category>
		<category><![CDATA[limma]]></category>
		<category><![CDATA[miRNA-seq]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1716</guid>
		<description><![CDATA[这一讲其实算不上是自学miRNA-seq分析，本质就是affymetrix的mR &#8230; <a href="http://www.bio-info-trainee.com/1716.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这一讲其实算不上是自学miRNA-seq分析，本质就是affymetrix的mRNA表达芯片数据分析，而且还是最常用的那种GPL570    HG-U133_Plus_2，但是因为是跟miRNA样本配对检测的，而且后面会利用到这两个数据分析结果来做共表达网络分析等等，所以就贴出对该芯片数据的分析结果。文章里面也提到了 Messenger RNA expression analysis identified 731 probe sets with significant differential expression，作者挑选的差异分析结果的显著基因列表如下：<span id="more-1716"></span>## <a href="http://journals.plos.org/plosone/article/asset?unique&amp;id=info:doi/10.1371/journal.pone.0108051.s002">http://journals.plos.org/plosone/article/asset?unique&amp;id=info:doi/10.1371/journal.pone.0108051.s002</a><br />
## mRNA expression array - GSE60291  (Affymetrix Human Genome U133 Plus 2.0 Array)</p>
<p>hgu133plus2芯片数据太常见了，可以从GEO里面下载该study的原始测序数据，然后用affy,limma包来分析，也可以直接用GEOquery包来下载作者分析好的表达矩阵，然后直接做差异分析。我这里选择的是后者，而且我跟作者分析方法有一点区别是，我先把探针都注释好了基因，然后对每个基因只挑最大表达量的基因。而作者是直接对探针为单位的的表达矩阵进行差异分析，对分析结果里面的探针进行基因注释。我这里无法给出哪种方法好的绝对评价。代码如下：</p>
<blockquote><p>rm(list=ls())<br />
library(GEOquery)<br />
library(limma)<br />
GSE60291 &lt;- getGEO('GSE60291', destdir=".",getGPL = F)</p>
<p>#下面是表达矩阵<br />
<strong><span style="color: #ff0000;">exprSet</span></strong>=exprs(GSE60291[[1]])<br />
library("annotate")<br />
GSE60291[[1]]<br />
## 下面是分组信息<br />
pdata=pData(GSE60291[[1]])<br />
<span style="color: #ff0000;"><strong>treatment</strong></span>=factor(unlist(lapply(pdata$title,function(x) strsplit(as.character(x),"-")[[1]][1])))<br />
#treatment=relevel(treatment,'control')<br />
## 下面做基因注释<br />
platformDB='hgu133plus2.db'<br />
library(platformDB, character.only=TRUE)<br />
probeset &lt;- featureNames(GSE60291[[1]])<br />
#EGID &lt;- as.numeric(lookUp(probeset, platformDB, "ENTREZID"))<br />
SYMBOL &lt;-  lookUp(probeset, platformDB, "SYMBOL")<br />
## 下面对每个基因挑选最大表达量探针<br />
a=cbind(SYMBOL,exprSet)<br />
## remove the duplicated probeset<br />
rmDupID &lt;-function(a=matrix(c(1,1:5,2,2:6,2,3:7),ncol=6)){<br />
exprSet=a[,-1]<br />
rowMeans=apply(exprSet,1,function(x) mean(as.numeric(x),na.rm=T))<br />
a=a[order(rowMeans,decreasing=T),]<br />
exprSet=a[!duplicated(a[,1]),]<br />
#<br />
exprSet=exprSet[!is.na(exprSet[,1]),]<br />
rownames(exprSet)=exprSet[,1]<br />
exprSet=exprSet[,-1]<br />
return(exprSet)<br />
}<br />
exprSet=rmDupID(a)<br />
rn=rownames(exprSet)<br />
exprSet=apply(exprSet,2,as.numeric)<br />
rownames(exprSet)=rn<br />
exprSet[1:4,1:4]<br />
#exprSet=log(exprSet) ## based on e<br />
boxplot(exprSet,las=2)<br />
## 下面用limma包来进行芯片数据差异分析<br />
design=model.matrix(~ treatment)<br />
fit=lmFit(exprSet,design)<br />
fit=eBayes(fit)<br />
#vennDiagram(decideTests(fit))<br />
DEG=topTable(fit,coef=2,n=Inf,adjust='BH')<br />
dim(DEG[abs(DEG[,1])&gt;1.2 &amp; DEG[,5]&lt;0.05,])  ## 806 genes<br />
write.csv(DEG,"ET1-normal.DEG.csv")</p></blockquote>
<p>得到的ET1-normal.DEG.csv 文件就是我们的差异分析结果，可以跟文章提供的差异结果做比较，是几乎一模一样的！</p>
<p>如果根据logFC 1.2 p 矫正P 值0.05来挑选，可以拿到806个基因。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1716.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>自学miRNA-seq分析第六讲~miRNA表达量差异分析</title>
		<link>http://www.bio-info-trainee.com/1714.html</link>
		<comments>http://www.bio-info-trainee.com/1714.html#comments</comments>
		<pubDate>Fri, 01 Jul 2016 15:11:26 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[DESeq]]></category>
		<category><![CDATA[DESeq2]]></category>
		<category><![CDATA[miRNA-seq]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1714</guid>
		<description><![CDATA[这一讲是miRNA-seq数据分析的分水岭，前面的5讲说的是读文献下载数据比对然 &#8230; <a href="http://www.bio-info-trainee.com/1714.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这一讲是miRNA-seq数据分析的分水岭，前面的5讲说的是读文献下载数据比对然后计算表达量，属于常规的流程分析，一般在公司测序之后都可以拿到分析结果，或者文献也会给出下载结果。但是单纯的分析一个样本意义不大，一般来说，我们做研究都是针对于不同状态下的miRNA表达量差异分析，然后做注释，功能分析，网络分析，这才是重点，也是难点。我这里就直接拿文献处理好的miRNA表达量来展示如何做下游分析，首先就是差异分析啦：<span id="more-1714"></span>根据文献，我们可以知道样本的分类情况是:</p>
<blockquote>
<div>GSM1470353: control-CM, experiment1; Homo sapiens; miRNA-Seq   SRR1542714</div>
<div>GSM1470354: ET1-CM, experiment1; Homo sapiens; miRNA-Seq  SRR1542715</div>
<div>GSM1470355: control-CM, experiment2; Homo sapiens; miRNA-SeqSRR1542716</div>
<div>GSM1470356: ET1-CM, experiment2; Homo sapiens; miRNA-Seq SRR1542717</div>
<div>GSM1470357: control-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542718</div>
<div>GSM1470358: ET1-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542719</div>
<div>可以看到是6个样本的测序数据，分成两组，就是ET1刺激了CM细胞系前后对比而已！</div>
</blockquote>
<div>同时，我们也拿到了这6个样本的表达矩阵，计量单位是counts的reads数，所以我们一般会选用DESeq2，edgeR这样的常用包来做差异分析，当然，做差异分析的工具还有十几个，我这里只是拿一根最顺手的举例子，就是DESeq2</div>
<div>下面的代码有点长，因为我在bioconductor系列教程里面多次提到了DESeq2使用方法，这里就只贴出代码，反正我要说的重点就是，我们进行了差异分析，然后得到差异miRNA列表</div>
<blockquote>
<div>### step8: differential expression analysis by R package for miRNA expression patterns:<br />
## 文章里面提到的结果是：<br />
MicroRNA sequencing revealed over 250 known and 34 predicted novel miRNAs to be differentially expressed between ET-1 stimulated and unstimulated control hiPSC-CMs.<br />
## (FDR &lt; 0.1 and 1.5 fold change)<br />
rm(list=ls())<br />
setwd('J:\\miRNA_test\\paper_results')  ##把从GEO里面下载的文献结果放在这里<br />
sampleIDs=c()<br />
groupList=c()<br />
allFiles=list.files(pattern = '.txt')<br />
i=allFiles[1]<br />
sampleID=strsplit(i,"_")[[1]][1]<br />
treat=strsplit(i,"_")[[1]][4]<br />
dat=read.table(i,stringsAsFactors = F)<br />
colnames(dat)=c('miRNA',sampleID)<br />
groupList=c(groupList,treat)<br />
for (i in allFiles[-1]){<br />
sampleID=strsplit(i,"_")[[1]][1]<br />
treat=strsplit(i,"_")[[1]][4]<br />
a=read.table(i,stringsAsFactors = F)<br />
colnames(a)=c('miRNA',sampleID)<br />
dat=merge(dat,a,by='miRNA')<br />
groupList=c(groupList,treat)<br />
}</div>
<div>### 上面的代码只是为了把6个独立的表达文件给合并成一个表达矩阵<br />
## we need to filter the low expression level miRNA<br />
exprSet=dat[,-1]<br />
rownames(exprSet)=dat[,1]<br />
suppressMessages(library(DESeq2))<br />
exprSet=ceiling(exprSet)<br />
(colData &lt;- data.frame(row.names=colnames(exprSet), groupList=groupList))</div>
<div>## DESeq2就是这么简单的用<br />
dds &lt;- DESeqDataSetFromMatrix(countData = exprSet,<br />
colData = colData,<br />
design = ~ groupList)<br />
dds &lt;- DESeq(dds)<br />
png("qc_dispersions.png", 1000, 1000, pointsize=20)<br />
plotDispEsts(dds, main="Dispersion plot")<br />
dev.off()<br />
res &lt;- results(dds)<br />
## 画一些图，相当于做QC吧<br />
png("RAWvsNORM.png")<br />
rld &lt;- rlogTransformation(dds)<br />
exprSet_new=assay(rld)<br />
par(cex = 0.7)<br />
n.sample=ncol(exprSet)<br />
if(n.sample&gt;40) par(cex = 0.5)<br />
cols &lt;- rainbow(n.sample*1.2)<br />
par(mfrow=c(2,2))<br />
boxplot(exprSet,  col = cols,main="expression value",las=2)<br />
boxplot(exprSet_new, col = cols,main="expression value",las=2)<br />
hist(exprSet[,1])<br />
hist(exprSet_new[,1])<br />
dev.off()library(RColorBrewer)<br />
(mycols &lt;- brewer.pal(8, "Dark2")[1:length(unique(groupList))])</p>
<p># Sample distance heatmap<br />
sampleDists &lt;- as.matrix(dist(t(exprSet_new)))<br />
#install.packages("gplots",repos = "http://cran.us.r-project.org")<br />
library(gplots)<br />
png("qc-heatmap-samples.png", w=1000, h=1000, pointsize=20)<br />
heatmap.2(as.matrix(sampleDists), key=F, trace="none",<br />
col=colorpanel(100, "black", "white"),<br />
ColSideColors=mycols[groupList], RowSideColors=mycols[groupList],<br />
margin=c(10, 10), main="Sample Distance Matrix")<br />
dev.off()</p>
<p>png("MA.png")<br />
DESeq2::plotMA(res, main="DESeq2", ylim=c(-2,2))<br />
dev.off()<br />
## 重点就是这里啦，得到了差异分析的结果<br />
resOrdered &lt;- res[order(res$padj),]<br />
resOrdered=as.data.frame(resOrdered)<br />
write.csv(resOrdered,"<span style="color: #ff0000;"><strong>deseq2.results.csv</strong></span>",quote = F)</p>
<p>##下面也是一些图，主要是看看样本之间的差异情况<br />
library(limma)<br />
plotMDS(log(counts(dds, normalized=TRUE) + 1))<br />
plotMDS(log(counts(dds, normalized=TRUE) + 1) - log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1))<br />
plotMDS( assays(dds)[["counts"]] )  ## raw count<br />
plotMDS( assays(dds)[["mu"]] ) ##- fitted values.</p>
</div>
</blockquote>
<div>最后我们得到的差异分析结果：deseq2.results.csv，就可以跟进FDR和fold change来挑选符合要求的差异miRNA啦</div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1714.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用samr包对芯片数据做差异分析</title>
		<link>http://www.bio-info-trainee.com/1608.html</link>
		<comments>http://www.bio-info-trainee.com/1608.html#comments</comments>
		<pubDate>Thu, 05 May 2016 11:43:04 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[limma]]></category>
		<category><![CDATA[samr]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1608</guid>
		<description><![CDATA[本来搞差异分析的工具和包就一大堆了，而且limma那个包已经非常完善了，我是不准 &#8230; <a href="http://www.bio-info-trainee.com/1608.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<blockquote><p>本来搞差异分析的工具和包就一大堆了，而且limma那个包已经非常完善了，我是不准备再讲这个的，正好有个同学问了一下这个包，我就随手测试了一下，顺便看看它跟limma有什么差异没有！手痒了就记录了测试流程！</p></blockquote>
<blockquote><p>学习一个包其实非常简单，就是找到包的官网看看说明书即可！<a href="https://cran.r-project.org/web/packages/samr/samr.pdf">说明书链接</a></p>
<p>&nbsp;</p></blockquote>
<p><span id="more-1608"></span></p>
<p>samr这个包更简单，就一个函数<strong>SAM</strong>,但是根据分析数据的不同被包装成了两个函数，分别是处理高通量测序数据的<strong>SAMseq</strong>和处理芯片数据的<strong>samr</strong>,本次我只讲解芯片数据的处理，然后跟limma这个包做一个简单比较~</p>
<p>所以，我们只需要制作好数据，然后学会用samr这个函数即可！</p>
<p>我们还是利用CLL这个包的测试数据来讲解这个包的用法，首先也是制作表达矩阵和分组信息。</p>
<blockquote>
<pre class="r"><code class="r"><span class="identifier">suppressPackageStartupMessages</span><span class="paren">(</span><span class="keyword">library</span><span class="paren">(</span><span class="identifier">CLL</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">data</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>
<span class="identifier">exprSet</span><span class="operator">=</span><span class="identifier">exprs</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>   <span class="comment">##sCLLex是依赖于CLL这个package的一个对象</span>
<span class="identifier">samples</span><span class="operator">=</span><span class="identifier">sampleNames</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>
<span class="identifier">pdata</span><span class="operator">=</span><span class="identifier">pData</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="paren">)</span>
<span class="identifier">group_list</span><span class="operator">=</span><span class="identifier">as.character</span><span class="paren">(</span><span class="identifier">pdata</span><span class="paren">[</span>,<span class="number">2</span><span class="paren">]</span><span class="paren">)</span>
<span class="identifier">group_list</span></code></pre>
<pre><code>##  [1] "progres." "stable"   "progres." "progres." "progres." "progres."
##  [7] "stable"   "stable"   "progres." "stable"   "progres." "stable"  
## [13] "progres." "stable"   "stable"   "progres." "progres." "progres."
## [19] "progres." "progres." "progres." "stable"</code></pre>
<pre class="r"><code class="r"><span class="identifier">as.numeric</span><span class="paren">(</span><span class="identifier">as.factor</span><span class="paren">(</span><span class="identifier">group_list</span><span class="paren">)</span><span class="paren">)</span></code></pre>
<pre><code>##  [1] 1 2 1 1 1 1 2 2 1 2 1 2 1 2 2 1 1 1 1 1 1 2</code></pre>
</blockquote>
<p>这个表达矩阵exprSet和分组信息group_list就可以直接用来做差异分析啦~！ 它的分组信息要求比较读取，需要1,1,1,2,2,2这样的向量，所以我用了as.numeric(as.factor(group_list))，具体见下面的代码！</p>
<blockquote>
<pre class="r"><code class="r"><span class="identifier">suppressPackageStartupMessages</span><span class="paren">(</span><span class="keyword">library</span><span class="paren">(</span><span class="identifier">samr</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">data</span><span class="operator">=</span><span class="identifier">list</span><span class="paren">(</span><span class="identifier">x</span><span class="operator">=</span><span class="identifier">exprSet</span>,<span class="identifier">y</span><span class="operator">=</span><span class="identifier">as.numeric</span><span class="paren">(</span><span class="identifier">as.factor</span><span class="paren">(</span><span class="identifier">group_list</span><span class="paren">)</span><span class="paren">)</span>, 
          <span class="identifier">geneid</span><span class="operator">=</span><span class="identifier">as.character</span><span class="paren">(</span><span class="number">1</span><span class="operator">:</span><span class="identifier">nrow</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span><span class="paren">)</span>,
          <span class="identifier">genenames</span><span class="operator">=</span><span class="identifier">rownames</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span>, 
          <span class="identifier">logged2</span><span class="operator">=</span><span class="literal">TRUE</span>
<span class="paren">)</span>
<span class="identifier">samr.obj</span><span class="operator">&lt;-</span><span class="identifier">samr</span><span class="paren">(</span><span class="identifier">data</span>, <span class="identifier">resp.type</span><span class="operator">=</span><span class="string">"Two class unpaired"</span>, <span class="identifier">nperms</span><span class="operator">=</span><span class="number">100</span><span class="paren">)</span></code></pre>
</blockquote>
<p>这样其实已经OK啦，重点是如何调整这个函数的参数，以及如何理解这个函数返回的结果(samr.obj这个对象非常重要，关乎你能否真正用好samr)~</p>
<p>我这里的genenames其实是探针名，如果真正要做分析，可以修改，而且我的nperms次数为100，也可以修改，一般是1000.</p>
<p>除了直接应用它找差异基因外，它还有几个单独的函数</p>
<p>首先是对表达矩阵进行normalization</p>
<blockquote>
<pre class="r"><code class="r"><span class="identifier">x.norm</span> <span class="operator">&lt;-</span> <span class="identifier">samr.norm.data</span><span class="paren">(</span><span class="identifier">data</span><span class="operator">$</span><span class="identifier">x</span><span class="paren">)</span>
<span class="identifier">par</span><span class="paren">(</span><span class="identifier">mfrow</span><span class="operator">=</span><span class="identifier">c</span><span class="paren">(</span><span class="number">1</span>,<span class="number">2</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">boxplot</span><span class="paren">(</span><span class="identifier">exprSet</span>, <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">rainbow</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span>,<span class="identifier">main</span><span class="operator">=</span><span class="string">"before normalization"</span>,<span class="identifier">las</span><span class="operator">=</span><span class="number">2</span><span class="paren">)</span>
<span class="identifier">boxplot</span><span class="paren">(</span><span class="identifier">x.norm</span>,  <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">rainbow</span><span class="paren">(</span><span class="identifier">exprSet</span><span class="paren">)</span>,<span class="identifier">main</span><span class="operator">=</span><span class="string">"after normalization"</span>,<span class="identifier">las</span><span class="operator">=</span><span class="number">2</span><span class="paren">)
<a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/QQ截图20160505194154.png"><img class="alignnone size-full wp-image-1609" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/05/QQ截图20160505194154.png" alt="QQ截图20160505194154" width="720" height="503" /></a>
</span></code></pre>
</blockquote>
<p>&nbsp;</p>
<p>看图好像没什么区别</p>
<p>另外几个函数，我就不一一介绍了，大家可以自行探索。</p>
<p>* samr.plot(samr.obj, del, min.foldchange=0)</p>
<p>* samr.plot(samr.obj, del=.3)</p>
<p>* samr.assess.samplesize.obj&lt;- samr.assess.samplesize(samr.obj, data, log2(1.5))</p>
<p>* samr.assess.samplesize.plot(samr.assess.samplesize.obj)</p>
<p>我们重点看看这个samr得到的差异与limma的差异区别在哪里</p>
<blockquote>
<pre class="r"><code class="r"><span class="comment">## 首先提取samr做差异分析检验的p值</span>
<span class="identifier">pv</span><span class="operator">=</span><span class="identifier">samr.pvalues.from.perms</span><span class="paren">(</span><span class="identifier">samr.obj</span><span class="operator">$</span><span class="identifier">tt</span>, <span class="identifier">samr.obj</span><span class="operator">$</span><span class="identifier">ttstar</span><span class="paren">)</span>
<span class="comment">## 然后提取limma包做差异分析检验的p值</span>
<span class="keyword">library</span><span class="paren">(</span><span class="identifier">limma</span><span class="paren">)</span> 
<span class="identifier">design</span><span class="operator">=</span><span class="identifier">model.matrix</span><span class="paren">(</span><span class="operator">~</span><span class="identifier">factor</span><span class="paren">(</span><span class="identifier">sCLLex</span><span class="operator">$</span><span class="identifier">Disease</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">fit</span><span class="operator">=</span><span class="identifier">lmFit</span><span class="paren">(</span><span class="identifier">sCLLex</span>,<span class="identifier">design</span><span class="paren">)</span>
<span class="identifier">fit</span><span class="operator">=</span><span class="identifier">eBayes</span><span class="paren">(</span><span class="identifier">fit</span><span class="paren">)</span>
<span class="identifier">options</span><span class="paren">(</span><span class="identifier">digits</span> <span class="operator">=</span> <span class="number">4</span><span class="paren">)</span>
<span class="identifier">DEG_limma</span><span class="operator">=</span><span class="identifier">topTable</span><span class="paren">(</span><span class="identifier">fit</span>,<span class="identifier">coef</span><span class="operator">=</span><span class="number">2</span>,<span class="identifier">adjust</span><span class="operator">=</span><span class="string">'BH'</span>,<span class="identifier">n</span><span class="operator">=</span><span class="literal">Inf</span><span class="paren">)</span> 
<span class="identifier">pv_limma</span><span class="operator">=</span><span class="identifier">DEG_limma</span><span class="operator">$</span><span class="identifier">P.Value</span>
<span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">)</span><span class="operator">=</span><span class="identifier">rownames</span><span class="paren">(</span><span class="identifier">DEG_limma</span><span class="paren">)</span>
<span class="identifier">head</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span><span class="paren">)</span></code></pre>
<pre><code>##  100_g_at   1000_at   1001_at 1002_f_at 1003_s_at   1004_at 
##    0.2531    0.4144    0.5671    0.5686    0.4687    0.6340</code></pre>
<pre class="r"><code class="r"><span class="identifier">head</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span><span class="paren">)</span></code></pre>
<pre><code>##  100_g_at   1000_at   1001_at 1002_f_at 1003_s_at   1004_at 
##    0.2497    0.4312    0.5349    0.5498    0.4361    0.6473</code></pre>
<pre class="r"><code class="r"><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span>,<span class="identifier">pv_limma</span><span class="paren">[</span><span class="identifier">sort</span><span class="paren">(</span><span class="identifier">names</span><span class="paren">(</span><span class="identifier">pv_limma</span><span class="paren">)</span><span class="paren">)</span><span class="paren">]</span><span class="paren">)</span></code></pre>
<pre><code>## [1] 0.9976</code></pre>
</blockquote>
<p>从数据上来看，没什么本质区别,而且相关系数高达0.9978.</p>
<p>所以结论是，没必要搞那么多的包，用limma就好了，甚至直接用t检验也是OK的</p>
<p>还有plot和summary也是可以直接作用于samr的结果samr.obj对象的</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1608.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用limma包的voom函数来对RNA-seq数据做差异分析</title>
		<link>http://www.bio-info-trainee.com/1544.html</link>
		<comments>http://www.bio-info-trainee.com/1544.html#comments</comments>
		<pubDate>Mon, 11 Apr 2016 14:36:05 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[RNA-seq]]></category>
		<category><![CDATA[voom]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1544</guid>
		<description><![CDATA[limma真不愧是最流行的差异分析包，十多年过去了，一直是芯片数据处理的好帮手。 &#8230; <a href="http://www.bio-info-trainee.com/1544.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>limma真不愧是最流行的差异分析包，十多年过去了，一直是芯片数据处理的好帮手。</p>
<p>现在又可以支持RNA-seq数据，我赶紧试用了一下!</p>
<p>我下面只讲用法，大家看代码就明白了！</p>
<p><span id="more-1544"></span></p>
<blockquote><p>##<br />
<span style="color: #ff00ff;">library(limma)</span><br />
library(pasilla)<br />
data(pasillaGenes)<br />
exprSet=counts(pasillaGenes)<br />
group_list=pasillaGenes$condition<br />
## 只需自己构造好表达矩阵exprSet和分因子即可group_list，一般只分成两组！！！<br />
##一般是自己读取RNA-seq的基因的reads的counts数进行分析，</p>
<p>##请不要用RPKM等经过了normlization的表达矩阵来分析。<br />
suppressMessages(library(limma))<br />
design &lt;- model.matrix(~factor(group_list))<br />
colnames(design)=levels(factor(group_list))<br />
rownames(design)=colnames(exprSet)<br />
<span style="color: #ff0000;">v &lt;- voom(exprSet,design,normalize="quantile") ##这个是重点</span><br />
## 到这里就跟limma本身的用法一样了！<br />
fit &lt;- lmFit(v,design)<br />
fit2 &lt;- eBayes(fit)<br />
tempOutput = topTable(fit2, coef=2, n=Inf)<br />
DEG_voom = na.omit(tempOutput)<br />
head(DEG_voom)</p></blockquote>
<p>它也是用了一种统计方法，把RNA-seq的基因的reads的counts数进行了normlization</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/QQ图片201604111917361.png"><img class="alignnone  wp-image-1538" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/QQ图片201604111917361.png" alt="QQ图片20160411191736" width="674" height="388" /></a></p>
<div>看这个图就知道了，它把本来应该是数据离散程度非常大的RNA-seq的基因的reads的counts矩阵经过normlization后变成了类似于芯片表达数据的表达矩阵，然后其实可以直接用T检验来找差异基因了！</div>
<div></div>
<div>但是，如果你的分组不只是两个，就复杂了，你需要再仔细研读说明书，甚至你可能需要咨询实验设计人员或者统计人员！</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1544.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用R语言的DESeq2包来对RNA-seq数据做差异分析</title>
		<link>http://www.bio-info-trainee.com/1533.html</link>
		<comments>http://www.bio-info-trainee.com/1533.html#comments</comments>
		<pubDate>Mon, 11 Apr 2016 11:21:35 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[DESeq]]></category>
		<category><![CDATA[DESeq2]]></category>
		<category><![CDATA[RNA-seq]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1533</guid>
		<description><![CDATA[我以前写过DESeq，以及过时了：http://www.bio-info-tra &#8230; <a href="http://www.bio-info-trainee.com/1533.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我以前写过DESeq，以及过时了：<a href="http://www.bio-info-trainee.com/867.html">http://www.bio-info-trainee.com/867.html</a></p>
<p>正好准备筹集bioconductor中文社区，我写简单讲一下DESeq2这个包如何用！</p>
<p><span id="more-1533"></span></p>
<blockquote><p>library(DESeq2)<br />
library(limma)<br />
library(pasilla)<br />
data(pasillaGenes)<br />
exprSet=counts(pasillaGenes)  ##做好表达矩阵<br />
group_list=pasillaGenes$condition##做好分组因子即可</p>
<p>(colData &lt;- data.frame(row.names=colnames(exprSet), group_list=group_list))<br />
dds &lt;- DESeqDataSetFromMatrix(countData = exprSet,<br />
colData = colData,<br />
design = ~ group_list)</p>
<p>##上面是第一步第一步，构建dds这个对象，<span style="color: #ff0000;">需要一个表达矩阵和分组矩阵！！！</span></p>
<div>
<blockquote>
<div>dds2 &lt;- DESeq(dds)  ##第二步，直接用DESeq函数即可</div>
<div>resultsNames(dds2)</div>
<div>res &lt;-  results(dds2, contrast=c("group_list","treated","untreated"))</div>
<div>## 提取你想要的差异分析结果，我们这里是treated组对untreated组进行比较</div>
<div>resOrdered &lt;- res[order(res$padj),]</div>
<div>resOrdered=as.data.frame(resOrdered)</div>
</blockquote>
<div>可以看到程序非常好用！</div>
<div>它只对RNA-seq的基因的reads的counts数进行分析，请不要用RPKM等经过了normlization的表达矩阵来分析。</div>
<div>值得一提的是DESeq2软件独有的normlization方法！</div>
<p>rld &lt;- rlogTransformation(dds2)  ## 得到经过DESeq2软件normlization的表达矩阵！<br />
exprSet_new=assay(rld)<br />
par(cex = 0.7)<br />
n.sample=ncol(exprSet)<br />
if(n.sample&gt;40) par(cex = 0.5)<br />
cols &lt;- rainbow(n.sample*1.2)<br />
par(mfrow=c(2,2))<br />
boxplot(exprSet, col = cols,main="expression value",las=2)<br />
boxplot(exprSet_new, col = cols,main="expression value",las=2)<br />
hist(exprSet)<br />
hist(exprSet_new)</p>
</div>
</blockquote>
<div></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/QQ图片20160411191736.png"><img class="alignnone  wp-image-1534" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/QQ图片20160411191736.png" alt="QQ图片20160411191736" width="586" height="337" /></a></p>
<div>
<div>看这个图就知道了，它把本来应该是数据离散程度非常大的RNA-seq的基因的reads的counts矩阵经过normlization后变成了类似于芯片表达数据的表达矩阵，然后其实可以直接用T检验来找差异基因了！</div>
<div></div>
<div>但是，如果你的分组不只是两个，就复杂了，你需要再仔细研读说明书，甚至你可能需要咨询实验设计人员或者统计人员！</div>
<div></div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1533.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>差异分析是否需要比较矩阵</title>
		<link>http://www.bio-info-trainee.com/1514.html</link>
		<comments>http://www.bio-info-trainee.com/1514.html#comments</comments>
		<pubDate>Sat, 09 Apr 2016 02:33:51 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信基础]]></category>
		<category><![CDATA[limma]]></category>
		<category><![CDATA[差异分析]]></category>
		<category><![CDATA[比较矩阵]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1514</guid>
		<description><![CDATA[最流行的差异分析软件就是limma了，它现在更新了一个voom的算法，所以既可以 &#8230; <a href="http://www.bio-info-trainee.com/1514.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<blockquote><p>最流行的差异分析软件就是limma了，它现在更新了一个voom的算法，所以既可以对芯片数据，也可以对转录组高通量测序数据进行分析，其它所有的差异分析软件其实都是模仿这个的。</p></blockquote>
<p>我以前讲到过做差异分析，需要三个数据：</p>
<ul>
<li>表达矩阵</li>
<li>分组矩阵</li>
<li>差异比较矩阵</li>
</ul>
<p>前面两个肯定是必须的，有表达矩阵，样本必须进行分组，才能分析，但是我看到过好几种例子，有的有差异比较矩阵，有的没有。</p>
<p>后来我仔细研究了一下limma包的说明书，发现这其实是一个很简单的问题。</p>
<h2><a id="user-content-大家仔细观察下面的两个代码" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/makeContrasts.md#大家仔细观察下面的两个代码"></a>大家仔细观察下面的两个代码</h2>
<h3><a id="user-content-首先是不需要差异比较矩阵的" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/makeContrasts.md#首先是不需要差异比较矩阵的"></a>首先是不需要差异比较矩阵的</h3>
<div class="highlight highlight-source-r">
<blockquote>
<pre>    library(<span class="pl-smi">CLL</span>)
    data(<span class="pl-smi">sCLLex</span>)
    library(<span class="pl-smi">limma</span>)
    <span class="pl-v">design</span><span class="pl-k">=</span>model.matrix(<span class="pl-k">~</span><span class="pl-k">factor</span>(<span class="pl-smi">sCLLex</span><span class="pl-k">$</span><span class="pl-smi">Disease</span>))
    <span class="pl-v">fit</span><span class="pl-k">=</span>lmFit(<span class="pl-smi">sCLLex</span>,<span class="pl-smi">design</span>)
    <span class="pl-v">fit</span><span class="pl-k">=</span>eBayes(<span class="pl-smi">fit</span>)
    options(<span class="pl-v">digits</span> <span class="pl-k">=</span> <span class="pl-c1">4</span>)
    <span class="pl-c">#topTable(fit,coef=2,adjust='BH') </span>
    <span class="pl-k">&gt;</span> topTable(<span class="pl-smi">fit</span>,<span class="pl-v">coef</span><span class="pl-k">=</span><span class="pl-c1">2</span>,<span class="pl-v">adjust</span><span class="pl-k">=</span><span class="pl-s"><span class="pl-pds">'</span>BH<span class="pl-pds">'</span></span>)
               <span class="pl-smi">logFC</span> <span class="pl-smi">AveExpr</span>      <span class="pl-smi">t</span>   <span class="pl-smi">P.Value</span> <span class="pl-smi">adj.P.Val</span>     <span class="pl-smi">B</span>
    <span class="pl-ii">39400_at</span>  <span class="pl-c1">1.0285</span>   <span class="pl-c1">5.621</span>  <span class="pl-c1">5.836</span> <span class="pl-c1">8.341e-06</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.234</span>
    <span class="pl-ii">36131_at</span> <span class="pl-k">-</span><span class="pl-c1">0.9888</span>   <span class="pl-c1">9.954</span> <span class="pl-k">-</span><span class="pl-c1">5.772</span> <span class="pl-c1">9.668e-06</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.117</span>
    <span class="pl-ii">33791_at</span> <span class="pl-k">-</span><span class="pl-c1">1.8302</span>   <span class="pl-c1">6.951</span> <span class="pl-k">-</span><span class="pl-c1">5.736</span> <span class="pl-c1">1.049e-05</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.052</span>
    <span class="pl-ii">1303_at</span>   <span class="pl-c1">1.3836</span>   <span class="pl-c1">4.463</span>  <span class="pl-c1">5.732</span> <span class="pl-c1">1.060e-05</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.044</span>
    <span class="pl-ii">36122_at</span> <span class="pl-k">-</span><span class="pl-c1">0.7801</span>   <span class="pl-c1">7.260</span> <span class="pl-k">-</span><span class="pl-c1">5.141</span> <span class="pl-c1">4.206e-05</span>   <span class="pl-c1">0.10619</span> <span class="pl-c1">1.935</span>
    <span class="pl-ii">36939_at</span> <span class="pl-k">-</span><span class="pl-c1">2.5472</span>   <span class="pl-c1">6.915</span> <span class="pl-k">-</span><span class="pl-c1">5.038</span> <span class="pl-c1">5.362e-05</span>   <span class="pl-c1">0.11283</span> <span class="pl-c1">1.737</span>
    <span class="pl-ii">41398_at</span>  <span class="pl-c1">0.5187</span>   <span class="pl-c1">7.602</span>  <span class="pl-c1">4.879</span> <span class="pl-c1">7.824e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.428</span>
    <span class="pl-ii">32599_at</span>  <span class="pl-c1">0.8544</span>   <span class="pl-c1">5.746</span>  <span class="pl-c1">4.859</span> <span class="pl-c1">8.207e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.389</span>
    <span class="pl-ii">36129_at</span>  <span class="pl-c1">0.9161</span>   <span class="pl-c1">8.209</span>  <span class="pl-c1">4.859</span> <span class="pl-c1">8.212e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.389</span>
    <span class="pl-ii">37636_at</span> <span class="pl-k">-</span><span class="pl-c1">1.6868</span>   <span class="pl-c1">5.697</span> <span class="pl-k">-</span><span class="pl-c1">4.804</span> <span class="pl-c1">9.355e-05</span>   <span class="pl-c1">0.11811</span> <span class="pl-c1">1.282</span>
</pre>
</blockquote>
</div>
<h3><a id="user-content-然后是需要差异比较矩阵的" class="anchor" href="https://github.com/bioconductor-china/basic/blob/master/makeContrasts.md#然后是需要差异比较矩阵的"></a>然后是需要差异比较矩阵的</h3>
<div class="highlight highlight-source-r">
<blockquote>
<pre>    library(<span class="pl-smi">CLL</span>)
    data(<span class="pl-smi">sCLLex</span>)
    library(<span class="pl-smi">limma</span>)
    <span class="pl-v">design</span><span class="pl-k">=</span>model.matrix(<span class="pl-k">~</span><span class="pl-c1">0</span><span class="pl-k">+</span><span class="pl-k">factor</span>(<span class="pl-smi">sCLLex</span><span class="pl-k">$</span><span class="pl-smi">Disease</span>))
    colnames(<span class="pl-smi">design</span>)<span class="pl-k">=</span>c(<span class="pl-s"><span class="pl-pds">'</span>progres<span class="pl-pds">'</span></span>,<span class="pl-s"><span class="pl-pds">'</span>stable<span class="pl-pds">'</span></span>)
    <span class="pl-v">fit</span><span class="pl-k">=</span>lmFit(<span class="pl-smi">sCLLex</span>,<span class="pl-smi">design</span>)
    <span class="pl-v">cont.matrix</span><span class="pl-k">=</span>makeContrasts(<span class="pl-s"><span class="pl-pds">'</span>progres-stable<span class="pl-pds">'</span></span>,<span class="pl-v">levels</span> <span class="pl-k">=</span> <span class="pl-smi">design</span>)
    <span class="pl-v">fit2</span><span class="pl-k">=</span>contrasts.fit(<span class="pl-smi">fit</span>,<span class="pl-smi">cont.matrix</span>)
    <span class="pl-v">fit2</span><span class="pl-k">=</span>eBayes(<span class="pl-smi">fit2</span>)
    options(<span class="pl-v">digits</span> <span class="pl-k">=</span> <span class="pl-c1">4</span>)
    topTable(<span class="pl-smi">fit2</span>,<span class="pl-v">adjust</span><span class="pl-k">=</span><span class="pl-s"><span class="pl-pds">'</span>BH<span class="pl-pds">'</span></span>)

               <span class="pl-smi">logFC</span> <span class="pl-smi">AveExpr</span>      <span class="pl-smi">t</span>   <span class="pl-smi">P.Value</span> <span class="pl-smi">adj.P.Val</span>     <span class="pl-smi">B</span>
    <span class="pl-ii">39400_at</span> <span class="pl-k">-</span><span class="pl-c1">1.0285</span>   <span class="pl-c1">5.621</span> <span class="pl-k">-</span><span class="pl-c1">5.836</span> <span class="pl-c1">8.341e-06</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.234</span>
    <span class="pl-ii">36131_at</span>  <span class="pl-c1">0.9888</span>   <span class="pl-c1">9.954</span>  <span class="pl-c1">5.772</span> <span class="pl-c1">9.668e-06</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.117</span>
    <span class="pl-ii">33791_at</span>  <span class="pl-c1">1.8302</span>   <span class="pl-c1">6.951</span>  <span class="pl-c1">5.736</span> <span class="pl-c1">1.049e-05</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.052</span>
    <span class="pl-ii">1303_at</span>  <span class="pl-k">-</span><span class="pl-c1">1.3836</span>   <span class="pl-c1">4.463</span> <span class="pl-k">-</span><span class="pl-c1">5.732</span> <span class="pl-c1">1.060e-05</span>   <span class="pl-c1">0.03344</span> <span class="pl-c1">3.044</span>
    <span class="pl-ii">36122_at</span>  <span class="pl-c1">0.7801</span>   <span class="pl-c1">7.260</span>  <span class="pl-c1">5.141</span> <span class="pl-c1">4.206e-05</span>   <span class="pl-c1">0.10619</span> <span class="pl-c1">1.935</span>
    <span class="pl-ii">36939_at</span>  <span class="pl-c1">2.5472</span>   <span class="pl-c1">6.915</span>  <span class="pl-c1">5.038</span> <span class="pl-c1">5.362e-05</span>   <span class="pl-c1">0.11283</span> <span class="pl-c1">1.737</span>
    <span class="pl-ii">41398_at</span> <span class="pl-k">-</span><span class="pl-c1">0.5187</span>   <span class="pl-c1">7.602</span> <span class="pl-k">-</span><span class="pl-c1">4.879</span> <span class="pl-c1">7.824e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.428</span>
    <span class="pl-ii">32599_at</span> <span class="pl-k">-</span><span class="pl-c1">0.8544</span>   <span class="pl-c1">5.746</span> <span class="pl-k">-</span><span class="pl-c1">4.859</span> <span class="pl-c1">8.207e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.389</span>
    <span class="pl-ii">36129_at</span> <span class="pl-k">-</span><span class="pl-c1">0.9161</span>   <span class="pl-c1">8.209</span> <span class="pl-k">-</span><span class="pl-c1">4.859</span> <span class="pl-c1">8.212e-05</span>   <span class="pl-c1">0.11520</span> <span class="pl-c1">1.389</span>
    <span class="pl-ii">37636_at</span>  <span class="pl-c1">1.6868</span>   <span class="pl-c1">5.697</span>  <span class="pl-c1">4.804</span> <span class="pl-c1">9.355e-05</span>   <span class="pl-c1">0.11811</span> <span class="pl-c1">1.282</span></pre>
</blockquote>
</div>
<p>大家运行一下这些代码就知道，两者结果是一模一样的。</p>
<p>而差异比较矩阵的需要与否，主要看分组矩阵如何制作的！</p>
<p>design=model.matrix(~factor(sCLLex$Disease))</p>
<p>design=model.matrix(~0+factor(sCLLex$Disease))</p>
<p>有本质的区别！！！</p>
<p>前面那种方法已经把需要比较的组做出到了一列，需要比较多次，就有多少列，第一列是截距不需要考虑，第二列开始往后用coef这个参数可以把差异分析结果一个个提取出来。</p>
<p>而后面那种方法，仅仅是分组而已，组之间需要如何比较，需要自己再制作差异比较矩阵，通过makeContrasts函数来控制如何比较！</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1514.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用RankComp的思想来做差异基因分析</title>
		<link>http://www.bio-info-trainee.com/1379.html</link>
		<comments>http://www.bio-info-trainee.com/1379.html#comments</comments>
		<pubDate>Fri, 22 Jan 2016 13:40:49 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[RankComp]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1379</guid>
		<description><![CDATA[是福建医科大学的学者开发的， 文章里面详细讲解了他们的这个差异分析的统计学原理  &#8230; <a href="http://www.bio-info-trainee.com/1379.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>是福建医科大学的学者开发的，</div>
<div>
<div>文章里面详细讲解了他们的这个差异分析的统计学原理</div>
<div>大意就是找到同一组织的normal样本的表达量数据，几百个，这样就可以分析2万基因之间的互相配对，检测表达量是否在几百个样本里面稳定的不一样！</div>
<div></div>
<p>我现在还不是很确定这个方法，只是试一试，欢迎与我交流对该方法的讨论！</p></div>
<div></div>
<div>文章是：</div>
<div>
<p>Wang H, Sun Q, Zhao W, et al.<span class="Apple-converted-space"> </span><b>Individual-level analysis of differential expression of genes and pathways for personalized medicine</b>[J]. Bioinformatics, 2014: btu522.</p>
<p>他们把它写成了一个R包，可以下载使用，但是必须用R2.15.2版本，我用了一下，不好用！</p>
<p>We can download the R code for in<span class="Apple-converted-space"> </span><a href="http://bioinformatics.oxfordjournals.org/content/31/1/62/suppl/DC1">http://bioinformatics.oxfordjournals.org/content/31/1/62/suppl/DC1</a></p>
<p>他们这个程序真心不好用，但是很容易看懂算法，可以自己用R语言写一个来实现同样的过程！</p></div>
<div></div>
<div>比如A基因在几百个样本里面表达量都是3左右，而B基因都是5左右，而且满足99%的A表达量高于B，那么这就是一个稳定的基因对！</div>
<div>一般2万基因之间可以配成2亿个基因对，其中稳定的大概有10%~40%</div>
<div>然后我们对每个疾病样本都可以进行检验，看看这样稳定的基因对是否被改变！</div>
<div>比如，我们拿到一个疾病样本的2万个基因的表达量，我们挑取一个，如果它有100个稳定的up的基因对，100个稳定的down的基因对</div>
<div>那么，我看看这些基因对是否被改变了，如果这样还有70基因对在该疾病样本里面仍然是up，60个是down，那么我用Fisher精确检验的结果是</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/42.png"><img class="alignnone size-full wp-image-1380" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/42.png" alt="4" width="527" height="234" /></a></div>
<div>这个基因在该疾病样本，相对于normal pool并没有差异表达！当然检验得到的P值最后可以做FDR校验。</div>
<div></div>
<div><b>依次这样，把所有的gene都分析完</b>，就知道这个样本有哪些差异的gene了。</div>
<div></div>
<div>介绍完原理，我们拿一个具体的例子来看看吧：</div>
<div>首先我们下载一个2008年的一个人的肝脏表达数据（Gene Expression in Human Liver），<b>都是正常组织，共427个样本。</b></div>
<div>不过这个芯片比较小众，是默克医药公司定制化的， 需要下载探针对应gene的文件！</div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9588">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9588</a></div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL4372">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL4372</a></div>
<div></div>
<div>我们读取GSE9588这个数据，得到表达矩阵，然后计算rank矩阵，然后计算得到comp矩阵</div>
<div><span style="font-family: Times New Roman;"><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/52.png"><img class="alignnone size-full wp-image-1381" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/52.png" alt="5" width="620" height="471" /></a></span></div>
<div><span style="font-family: Times New Roman;">&gt; table(rank_comp)</span></div>
<div><span style="font-family: Times New Roman;">rank_comp</span></div>
<div><span style="font-family: Times New Roman;">     down        no        up</span></div>
<div><span style="font-family: Times New Roman;">    58479 465752098     58479</span></div>
<div><span style="font-family: Times New Roman;">&gt;</span></div>
<div>不知道为什么这个数据，stable的那么少，不知道是不是出了什么问题！</div>
<div>其实我的程序都是对的了，只是因为这个数据集已经不是纯粹的表达量数据了，而是这427个样本的数据都减去了某个样本的表达量。</div>
<div>这样每个个体的基因之间的表达量排序就会被干扰，导致得到的稳定基因对非常少！！！</div>
<div>但是，我后来下载了GTEx的表达数据，拿那里面的normal组织样本表达量来做，可以得到非常多的稳定基因对。</div>
<div>实际代码大概是：</div>
<div>得到正常组织的表达矩阵：</div>
<div>然后计算表达矩阵的rank，得到各个样本自己的基因排序情况，得到排序矩阵！</div>
<div>处理排序矩阵，每个基因对之间都算一下是否稳定，得到稳定性描述矩阵！</div>
<div>然后根据每个疾病个体的基因表达情况，来循环每个基因， 看看该基因是否差异！</div>
<div></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1379.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>关于limma包差异分析结果的logFC解释</title>
		<link>http://www.bio-info-trainee.com/1209.html</link>
		<comments>http://www.bio-info-trainee.com/1209.html#comments</comments>
		<pubDate>Fri, 11 Dec 2015 16:00:06 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[杂谈-随笔]]></category>
		<category><![CDATA[foldchange]]></category>
		<category><![CDATA[logFC]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1209</guid>
		<description><![CDATA[首先，我们要明白，limma接受的输入参数就是一个表达矩阵，而且是log后的表达 &#8230; <a href="http://www.bio-info-trainee.com/1209.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>首先，我们要明白，limma接受的输入参数就是一个表达矩阵，而且是log后的表达矩阵（以2为底）。</p>
<p>那么最后计算得到的logFC这一列的值，其实就是输入的表达矩阵中case一组的平均表达量减去control一组的平均表达量的值，那么就会有正负之分，代表了case相当于control组来说，该基因是上调还是下调。</p>
<p>我之前总是有疑问，明明是case一组的平均表达量和control一组的平均表达量差值呀，跟log foldchange没有什么关系呀。</p>
<p>后来，我终于想通了，因为我们输入的是log后的表达矩阵，那么case一组的平均表达量和control一组的平均表达量都是log了的，那么它们的差值其实就是log的foldchange</p>
<p>首先，我们要理解foldchange的意义，如果case是平均表达量是8，control是2，那么foldchange就是4，logFC就是2咯</p>
<p>那么在limma包里面，输入的时候case的平均表达量被log后是3，control是1，那么差值是2，就是说logFC就是2。</p>
<p>这不是巧合，只是一个很简单的数学公式log(x/y)=log(x)-log(y)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1209.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用excel表格做差异分析</title>
		<link>http://www.bio-info-trainee.com/1205.html</link>
		<comments>http://www.bio-info-trainee.com/1205.html#comments</comments>
		<pubDate>Fri, 11 Dec 2015 15:24:45 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1205</guid>
		<description><![CDATA[其实主要要讲的不是用excel来做差异分析，只是想讲清楚差异分析的原理，用exc &#8230; <a href="http://www.bio-info-trainee.com/1205.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>其实主要要讲的不是用excel来做差异分析，只是想讲清楚差异分析的原理，用excel可视化的操作可能会更方便理解，而且想告诉大家，其实生物信息学分析，本来就很简单的，那么多软件，只有你理解了原理，你自己就能写出来的！</p>
<p>首先，还是得到表达矩阵，下面绿色的样本是NASH组，蓝色的样本是normal组</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0012.png"><img class="alignnone size-full wp-image-1206" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0012.png" alt="image001" width="1253" height="478" /></a></p>
<p>我们进行差异分析，很简单，就是看两组的表达值，是否差异，而检验的方法就是T检验。</p>
<p>=AVERAGE(D2:L2)    ##求NASH组的平均表达量</p>
<p>=AVERAGE(M2:S2)    ###求normal的平均表达量</p>
<p>=T2-U2             ##计算得到logFOLDchange值</p>
<p>=AVERAGE(D2:S2)    ###得到所有样本的平均表达量</p>
<p>=T.TEST(D2:L2,M2:T2,2,3)  ###用T检验得到两个组的表达量的差异显著程度。</p>
<p>简单检查几个值就可以看到跟limma包得到的结果差不多。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0022.png"><img class="alignnone size-full wp-image-1207" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0022.png" alt="image002" width="494" height="382" /></a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1205.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用limma包对芯片数据做差异分析</title>
		<link>http://www.bio-info-trainee.com/1194.html</link>
		<comments>http://www.bio-info-trainee.com/1194.html#comments</comments>
		<pubDate>Fri, 11 Dec 2015 14:34:55 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[limma]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1194</guid>
		<description><![CDATA[下载该R语言包，然后看说明书，需要自己做好三个数据（表达矩阵，分组矩阵，差异比较 &#8230; <a href="http://www.bio-info-trainee.com/1194.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>下载该R语言包，然后看说明书，需要自己做好三个数据（表达矩阵，分组矩阵，差异比较矩阵），总共三个步骤（lmFit,eBayes,topTable）就可以啦</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0011.png"><img class="alignnone size-full wp-image-1195" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0011.png" alt="image001" width="1002" height="476" /></a></p>
<p>首先做第一个数据，基因表达矩阵！</p>
<p>自己在NCBI里面可以查到下载地址，然后用R语言读取即可</p>
<p>exprSet=read.table("GSE63067_series_matrix.txt.gz",comment.char = "!",stringsAsFactors=F,header=T)</p>
<p>rownames(exprSet)=exprSet[,1]</p>
<p>exprSet=exprSet[,-1]</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0021.png"><img class="alignnone size-full wp-image-1196" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0021.png" alt="image002" width="682" height="354" /></a></p>
<p>然后做好分组矩阵，如下</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0031.png"><img class="alignnone size-full wp-image-1197" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0031.png" alt="image003" width="294" height="376" /></a></p>
<p>然后做好，差异比较矩阵，就是说明你想把那些组拿起来做差异分析，如下</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0041.png"><img class="alignnone size-full wp-image-1198" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image0041.png" alt="image004" width="542" height="112" /></a></p>
<p>最后输出结果：</p>
<p>我进行了6次比较，所以会输出6次比较结果</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image005.png"><img class="alignnone size-full wp-image-1199" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image005.png" alt="image005" width="591" height="157" /></a></p>
<p>最后打开差异结果，解读，说明书如下！</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image006.png">忒<img class="alignnone size-full wp-image-1200" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image006.png" alt="image006" width="1026" height="584" /></a></p>
<p>在我的github有完整代码</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1194.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
