<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; DESeq</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/deseq/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>自学miRNA-seq分析第六讲~miRNA表达量差异分析</title>
		<link>http://www.bio-info-trainee.com/1714.html</link>
		<comments>http://www.bio-info-trainee.com/1714.html#comments</comments>
		<pubDate>Fri, 01 Jul 2016 15:11:26 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[tutorial]]></category>
		<category><![CDATA[DESeq]]></category>
		<category><![CDATA[DESeq2]]></category>
		<category><![CDATA[miRNA-seq]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1714</guid>
		<description><![CDATA[这一讲是miRNA-seq数据分析的分水岭，前面的5讲说的是读文献下载数据比对然 &#8230; <a href="http://www.bio-info-trainee.com/1714.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这一讲是miRNA-seq数据分析的分水岭，前面的5讲说的是读文献下载数据比对然后计算表达量，属于常规的流程分析，一般在公司测序之后都可以拿到分析结果，或者文献也会给出下载结果。但是单纯的分析一个样本意义不大，一般来说，我们做研究都是针对于不同状态下的miRNA表达量差异分析，然后做注释，功能分析，网络分析，这才是重点，也是难点。我这里就直接拿文献处理好的miRNA表达量来展示如何做下游分析，首先就是差异分析啦：<span id="more-1714"></span>根据文献，我们可以知道样本的分类情况是:</p>
<blockquote>
<div>GSM1470353: control-CM, experiment1; Homo sapiens; miRNA-Seq   SRR1542714</div>
<div>GSM1470354: ET1-CM, experiment1; Homo sapiens; miRNA-Seq  SRR1542715</div>
<div>GSM1470355: control-CM, experiment2; Homo sapiens; miRNA-SeqSRR1542716</div>
<div>GSM1470356: ET1-CM, experiment2; Homo sapiens; miRNA-Seq SRR1542717</div>
<div>GSM1470357: control-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542718</div>
<div>GSM1470358: ET1-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542719</div>
<div>可以看到是6个样本的测序数据，分成两组，就是ET1刺激了CM细胞系前后对比而已！</div>
</blockquote>
<div>同时，我们也拿到了这6个样本的表达矩阵，计量单位是counts的reads数，所以我们一般会选用DESeq2，edgeR这样的常用包来做差异分析，当然，做差异分析的工具还有十几个，我这里只是拿一根最顺手的举例子，就是DESeq2</div>
<div>下面的代码有点长，因为我在bioconductor系列教程里面多次提到了DESeq2使用方法，这里就只贴出代码，反正我要说的重点就是，我们进行了差异分析，然后得到差异miRNA列表</div>
<blockquote>
<div>### step8: differential expression analysis by R package for miRNA expression patterns:<br />
## 文章里面提到的结果是：<br />
MicroRNA sequencing revealed over 250 known and 34 predicted novel miRNAs to be differentially expressed between ET-1 stimulated and unstimulated control hiPSC-CMs.<br />
## (FDR &lt; 0.1 and 1.5 fold change)<br />
rm(list=ls())<br />
setwd('J:\\miRNA_test\\paper_results')  ##把从GEO里面下载的文献结果放在这里<br />
sampleIDs=c()<br />
groupList=c()<br />
allFiles=list.files(pattern = '.txt')<br />
i=allFiles[1]<br />
sampleID=strsplit(i,"_")[[1]][1]<br />
treat=strsplit(i,"_")[[1]][4]<br />
dat=read.table(i,stringsAsFactors = F)<br />
colnames(dat)=c('miRNA',sampleID)<br />
groupList=c(groupList,treat)<br />
for (i in allFiles[-1]){<br />
sampleID=strsplit(i,"_")[[1]][1]<br />
treat=strsplit(i,"_")[[1]][4]<br />
a=read.table(i,stringsAsFactors = F)<br />
colnames(a)=c('miRNA',sampleID)<br />
dat=merge(dat,a,by='miRNA')<br />
groupList=c(groupList,treat)<br />
}</div>
<div>### 上面的代码只是为了把6个独立的表达文件给合并成一个表达矩阵<br />
## we need to filter the low expression level miRNA<br />
exprSet=dat[,-1]<br />
rownames(exprSet)=dat[,1]<br />
suppressMessages(library(DESeq2))<br />
exprSet=ceiling(exprSet)<br />
(colData &lt;- data.frame(row.names=colnames(exprSet), groupList=groupList))</div>
<div>## DESeq2就是这么简单的用<br />
dds &lt;- DESeqDataSetFromMatrix(countData = exprSet,<br />
colData = colData,<br />
design = ~ groupList)<br />
dds &lt;- DESeq(dds)<br />
png("qc_dispersions.png", 1000, 1000, pointsize=20)<br />
plotDispEsts(dds, main="Dispersion plot")<br />
dev.off()<br />
res &lt;- results(dds)<br />
## 画一些图，相当于做QC吧<br />
png("RAWvsNORM.png")<br />
rld &lt;- rlogTransformation(dds)<br />
exprSet_new=assay(rld)<br />
par(cex = 0.7)<br />
n.sample=ncol(exprSet)<br />
if(n.sample&gt;40) par(cex = 0.5)<br />
cols &lt;- rainbow(n.sample*1.2)<br />
par(mfrow=c(2,2))<br />
boxplot(exprSet,  col = cols,main="expression value",las=2)<br />
boxplot(exprSet_new, col = cols,main="expression value",las=2)<br />
hist(exprSet[,1])<br />
hist(exprSet_new[,1])<br />
dev.off()library(RColorBrewer)<br />
(mycols &lt;- brewer.pal(8, "Dark2")[1:length(unique(groupList))])</p>
<p># Sample distance heatmap<br />
sampleDists &lt;- as.matrix(dist(t(exprSet_new)))<br />
#install.packages("gplots",repos = "http://cran.us.r-project.org")<br />
library(gplots)<br />
png("qc-heatmap-samples.png", w=1000, h=1000, pointsize=20)<br />
heatmap.2(as.matrix(sampleDists), key=F, trace="none",<br />
col=colorpanel(100, "black", "white"),<br />
ColSideColors=mycols[groupList], RowSideColors=mycols[groupList],<br />
margin=c(10, 10), main="Sample Distance Matrix")<br />
dev.off()</p>
<p>png("MA.png")<br />
DESeq2::plotMA(res, main="DESeq2", ylim=c(-2,2))<br />
dev.off()<br />
## 重点就是这里啦，得到了差异分析的结果<br />
resOrdered &lt;- res[order(res$padj),]<br />
resOrdered=as.data.frame(resOrdered)<br />
write.csv(resOrdered,"<span style="color: #ff0000;"><strong>deseq2.results.csv</strong></span>",quote = F)</p>
<p>##下面也是一些图，主要是看看样本之间的差异情况<br />
library(limma)<br />
plotMDS(log(counts(dds, normalized=TRUE) + 1))<br />
plotMDS(log(counts(dds, normalized=TRUE) + 1) - log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1))<br />
plotMDS( assays(dds)[["counts"]] )  ## raw count<br />
plotMDS( assays(dds)[["mu"]] ) ##- fitted values.</p>
</div>
</blockquote>
<div>最后我们得到的差异分析结果：deseq2.results.csv，就可以跟进FDR和fold change来挑选符合要求的差异miRNA啦</div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1714.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用R语言的DESeq2包来对RNA-seq数据做差异分析</title>
		<link>http://www.bio-info-trainee.com/1533.html</link>
		<comments>http://www.bio-info-trainee.com/1533.html#comments</comments>
		<pubDate>Mon, 11 Apr 2016 11:21:35 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[DESeq]]></category>
		<category><![CDATA[DESeq2]]></category>
		<category><![CDATA[RNA-seq]]></category>
		<category><![CDATA[差异分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1533</guid>
		<description><![CDATA[我以前写过DESeq，以及过时了：http://www.bio-info-tra &#8230; <a href="http://www.bio-info-trainee.com/1533.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我以前写过DESeq，以及过时了：<a href="http://www.bio-info-trainee.com/867.html">http://www.bio-info-trainee.com/867.html</a></p>
<p>正好准备筹集bioconductor中文社区，我写简单讲一下DESeq2这个包如何用！</p>
<p><span id="more-1533"></span></p>
<blockquote><p>library(DESeq2)<br />
library(limma)<br />
library(pasilla)<br />
data(pasillaGenes)<br />
exprSet=counts(pasillaGenes)  ##做好表达矩阵<br />
group_list=pasillaGenes$condition##做好分组因子即可</p>
<p>(colData &lt;- data.frame(row.names=colnames(exprSet), group_list=group_list))<br />
dds &lt;- DESeqDataSetFromMatrix(countData = exprSet,<br />
colData = colData,<br />
design = ~ group_list)</p>
<p>##上面是第一步第一步，构建dds这个对象，<span style="color: #ff0000;">需要一个表达矩阵和分组矩阵！！！</span></p>
<div>
<blockquote>
<div>dds2 &lt;- DESeq(dds)  ##第二步，直接用DESeq函数即可</div>
<div>resultsNames(dds2)</div>
<div>res &lt;-  results(dds2, contrast=c("group_list","treated","untreated"))</div>
<div>## 提取你想要的差异分析结果，我们这里是treated组对untreated组进行比较</div>
<div>resOrdered &lt;- res[order(res$padj),]</div>
<div>resOrdered=as.data.frame(resOrdered)</div>
</blockquote>
<div>可以看到程序非常好用！</div>
<div>它只对RNA-seq的基因的reads的counts数进行分析，请不要用RPKM等经过了normlization的表达矩阵来分析。</div>
<div>值得一提的是DESeq2软件独有的normlization方法！</div>
<p>rld &lt;- rlogTransformation(dds2)  ## 得到经过DESeq2软件normlization的表达矩阵！<br />
exprSet_new=assay(rld)<br />
par(cex = 0.7)<br />
n.sample=ncol(exprSet)<br />
if(n.sample&gt;40) par(cex = 0.5)<br />
cols &lt;- rainbow(n.sample*1.2)<br />
par(mfrow=c(2,2))<br />
boxplot(exprSet, col = cols,main="expression value",las=2)<br />
boxplot(exprSet_new, col = cols,main="expression value",las=2)<br />
hist(exprSet)<br />
hist(exprSet_new)</p>
</div>
</blockquote>
<div></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/QQ图片20160411191736.png"><img class="alignnone  wp-image-1534" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/QQ图片20160411191736.png" alt="QQ图片20160411191736" width="586" height="337" /></a></p>
<div>
<div>看这个图就知道了，它把本来应该是数据离散程度非常大的RNA-seq的基因的reads的counts矩阵经过normlization后变成了类似于芯片表达数据的表达矩阵，然后其实可以直接用T检验来找差异基因了！</div>
<div></div>
<div>但是，如果你的分组不只是两个，就复杂了，你需要再仔细研读说明书，甚至你可能需要咨询实验设计人员或者统计人员！</div>
<div></div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1533.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用DESeq进行差异分析的源代码</title>
		<link>http://www.bio-info-trainee.com/867.html</link>
		<comments>http://www.bio-info-trainee.com/867.html#comments</comments>
		<pubDate>Fri, 17 Jul 2015 03:23:58 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[DESeq]]></category>
		<category><![CDATA[源代码]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=867</guid>
		<description><![CDATA[要保证当前文件夹下面有了742KO1.count等4个文件，就是用htseq对比 &#8230; <a href="http://www.bio-info-trainee.com/867.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>要保证当前文件夹下面有了742KO1.count等4个文件，就是用htseq对比对的bam文件进行处理后的输出文件</p>
<p>library(DESeq)<br />
#加载数据<br />
K1=read.table("742KO1.count",row.names=1)<br />
K2=read.table("743KO2.count",row.names=1)<br />
W1=read.table("740WT1.count",row.names=1)<br />
W2=read.table("741WT2.count",row.names=1)<br />
#列名<br />
data=cbind(K1,K2,W1,W2)<br />
#如果是htseq的结果，则删除data最后四行<br />
n=nrow(data)<br />
data=data</p>
<p>[c language="(-n+4:-n),"][/c]</p>
<p>#如果是bedtools的结果，取出统计个数列和行名<br />
kk1=cbind(K1$V5)<br />
rownames(kk1)=rownames(K1)<br />
K1=kk1</p>
<p>#差异分析<br />
colnames(data)=c("K1","K2","W1","W2")<br />
type=rep(c("K","W"),c(2,2))<br />
de=newCountDataSet(data,type)<br />
de=estimateSizeFactors(de)<br />
de=estimateDispersions(de)<br />
res=nbinomTest(de,"K","W")</p>
<p>#res就是我们的表达量检验结果</p>
<p>到这里，理论上差异基因的分析已经结束啦！后面只是关于R的bioconductor包的一些简单结合使用而已</p>
<p>library(org.Mm.eg.db)</p>
<p>tmp=select(org.Mm.eg.db, keys=res$id, columns=c("ENTREZID","SYMBOL"), keytype="ENSEMBL")</p>
<p>#合并res和tmp<br />
res=merge(tmp,res,by.x="ENSEMBL",by.y="id",all=TRUE)</p>
<p>#go<br />
tmp=select(org.Mm.eg.db, keys=res$ENSEMBL, columns="GO", keytype="ENSEMBL")<br />
ensembl_go=unlist(tapply(tmp[,2],as.factor(tmp[,1]),function(x) paste(x,collapse ="|"),simplify =F))</p>
<p>#为res加入go注释，<br />
res$go=ensembl_go[res$ENSEMBL]#为res加入一列go</p>
<p>#写入all——data<br />
all_res=res<br />
write.csv(res,file="all_data.csv",row.names =F)</p>
<p>uniq=na.omit(res)#删除无效基因<br />
sort_uniq=uniq[order(uniq$padj),]#按照矫正p值排序</p>
<p>#写入排序后的all_data<br />
write.csv(res,file="all_data.csv",row.names =F)</p>
<p>#标记上下调基因<br />
sort_uniq$up_down=ifelse(sort_uniq$baseMeanA&gt;sort_uniq$baseMeanB,"up","down")<br />
#交换上下调基因列位置<br />
final_res=sort_uniq[,c(12,1:11)]<br />
#写出最后数据<br />
write.csv(final_res,file="final_annotation_gene_bedtools_sort_uniq.csv",row.names =F)</p>
<p>#然后挑选出padj值小于0.05的数据来做富集<br />
tmp=select(org.Mm.eg.db, keys=sort_uniq[sort_uniq$padj&lt;0.05,1], columns="ENTREZID", keytype="ENSEMBL")<br />
diff_ENTREZID=tmp$ENTREZID<br />
require(DOSE)<br />
require(clusterProfiler)<br />
diff_ENTREZID=na.omit(diff_ENTREZID)<br />
ego &lt;- enrichGO(gene=diff_ENTREZID,organism="mouse",ont="CC",pvalueCutoff=0.05,readable=TRUE)<br />
ekk &lt;- enrichKEGG(gene=diff_ENTREZID,organism="mouse",pvalueCutoff=0.05,readable=TRUE)<br />
write.csv(summary(ekk),"KEGG-enrich.csv",row.names =F)<br />
write.csv(summary(ego),"GO-enrich.csv",row.names =F)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/867.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R语言DESeq找差异基因</title>
		<link>http://www.bio-info-trainee.com/741.html</link>
		<comments>http://www.bio-info-trainee.com/741.html#comments</comments>
		<pubDate>Mon, 18 May 2015 06:24:49 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信组学技术]]></category>
		<category><![CDATA[转录组软件]]></category>
		<category><![CDATA[DESeq]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[差异基因]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=741</guid>
		<description><![CDATA[一：安装并加装该R包 安装就用source("http://bioconduct &#8230; <a href="http://www.bio-info-trainee.com/741.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h3>一：安装并加装该R包</h3>
<p>安装就用source("http://bioconductor.org/biocLite.R") ;biocLite("DESeq")即可，如果安装失败，就需要自己下载源码包，然后安装R模块。</p>
<p>&nbsp;</p>
<p>二．所需要数据</p>
<p>它的说明书指定了我们一个数据</p>
<p>source("http://bioconductor.org/biocLite.R") ;biocLite("pasilla")</p>
<p>安装了pasilla这个包之后，在这个包的安装目录就可以找到一个表格文件，就是我们的DESeq需要的文件。</p>
<p>C:\Program Files\R\R-3.2.0\library\pasilla\extdata\pasilla_gene_counts.tsv</p>
<p>说明书原话是这样的</p>
<p>The table cell in the i-th row and the j-th column of the table tells how many reads have been mapped to gene i in sample j.</p>
<p>一般我们需要用htseq-count这个程序对我们的每个样本的sam文件做处理计数，并合并这样的数据</p>
<p>下面这个是示例数据，第一列是基因ID号，后面的每一列都是一个样本。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/图片12.png"><img class="alignnone size-full wp-image-742" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/图片12.png" alt="图片1" width="549" height="405" /></a></p>
<p>de = newCountDataSet( pasillaCountTable, condition )  #根据我们的样本因子把基因计数表格读入成一个cds对象，这个newCountDataSet函数就是为了构建对象！</p>
<p>对我们构建好的de对象就可以直接开始找差异啦！非常简单的几步即可</p>
<p>de=estimateSizeFactors(de)</p>
<p>de=estimateDispersions(de)</p>
<p>res=nbinomTest(de,"K","W") #最重要的就是这个res表格啦！</p>
<p>uniq=na.omit(res)</p>
<p>我这里是对4个样本用htseq计数后的文件来做的，贴出完整代码吧</p>
<p>library(DESeq)</p>
<p>#首先读取htseq对bam或者sam比对文件的计数结果</p>
<p>K1=read.table("742KO1.count",row.names=1)</p>
<p>K2=read.table("743KO2.count",row.names=1)</p>
<p>W1=read.table("740WT1.count",row.names=1)</p>
<p>W2=read.table("741WT2.count",row.names=1)</p>
<p>data=cbind(K1,K2,W1,W2)</p>
<p>data=data[-c(43630:43634),]</p>
<p>#把我们的多个样本计数结果合并起来成数据框，列是不同样本，行是不同基因</p>
<p>colnames(data)=c("K1","K2","W1","W2")</p>
<p>type=rep(c("K","W"),c(2,2))</p>
<p>#构造成DESeq的对象，并对分组样本进行基因表达量检验</p>
<p>de=newCountDataSet(data,type)</p>
<p>de=estimateSizeFactors(de)</p>
<p>de=estimateDispersions(de)</p>
<p>res=nbinomTest(de,"K","W")</p>
<p>#res就是我们的表达量检验结果</p>
<p>library(org.Mm.eg.db)</p>
<p>tmp=select(org.Mm.eg.db, keys=res$id, columns="GO", keytype="ENSEMBL")</p>
<p>ensembl_go=unlist(tapply(tmp[,2],as.factor(tmp[,1]),function(x) paste(x,collapse ="|"),simplify =F))</p>
<p>#首先输出所有的计数数据，加上go注释信息</p>
<p>all_res=res</p>
<p>res$go=ensembl_go[res$id]</p>
<p>write.csv(res,file="all_data.csv",row.names =F)</p>
<p>#然后输出有意义的数据，即剔除那些没有检测到表达的基因</p>
<p>uniq=na.omit(res)</p>
<p>sort_uniq=uniq[order(uniq$padj),]</p>
<p>write.csv(sort_uniq,file="sort_uniq.csv",row.names =F)</p>
<p>#然后挑选出padj值小于0.05的差异基因数据来做富集，富集用的YGC的两个包，在我前面的博客已经详细说明了！</p>
<p>tmp=select(org.Mm.eg.db, keys=sort_uniq[sort_uniq$padj&lt;0.05,1], columns="ENTREZID", keytype="ENSEMBL")</p>
<p>diff_ENTREZID=tmp$ENTREZID</p>
<p>require(DOSE)</p>
<p>require(clusterProfiler)</p>
<p>diff_ENTREZID=na.omit(diff_ENTREZID)</p>
<p>ego &lt;- enrichGO(gene=diff_ENTREZID,organism="mouse",ont="CC",pvalueCutoff=0.01,readable=TRUE)</p>
<p>ekk &lt;- enrichKEGG(gene=diff_ENTREZID,organism="mouse",pvalueCutoff=0.01,readable=TRUE)</p>
<p>write.csv(summary(ekk),"KEGG-enrich.csv",row.names =F)</p>
<p>write.csv(summary(ego),"GO-enrich.csv",row.names =F)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/741.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
