<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 聚类</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e8%81%9a%e7%b1%bb/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>根据基因表达量对样品进行分类ConsensusClusterPlus</title>
		<link>http://www.bio-info-trainee.com/945.html</link>
		<comments>http://www.bio-info-trainee.com/945.html#comments</comments>
		<pubDate>Thu, 27 Aug 2015 13:30:19 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[ConsensusClusterPlus]]></category>
		<category><![CDATA[聚类]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=945</guid>
		<description><![CDATA[bioconductor系列的包都是一样的安装方式： source("http: &#8230; <a href="http://www.bio-info-trainee.com/945.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>
<pre>bioconductor系列的包都是一样的安装方式：</pre>
<pre>source("http://bioconductor.org/biocLite.R")
biocLite("ConsensusClusterPlus")</pre>
<p>这个包是我见过最简单的包， 加载只有做好输入数据，只需要一句话即可运行，然后默认输出所有结果</p></div>
<div><a href="http://www.bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.html">http://www.bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.html</a></div>
<div><a href="http://www.bioconductor.org/packages/release/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf">http://www.bioconductor.org/packages/release/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf</a></div>
<div>读这个包的readme，很容易学会</div>
<div>就是做好一个需要来进行分类的样品的表达量矩阵。或者选择上一篇日志用GEOquery这个包下载的表达量矩阵也可以进行分析</div>
<div>因为这个包是用ALL数据来做测试的，所以可以直接加载这个数据结果，这样就能得到表达矩阵啦</div>
<div>
<div data-canvas-width="121.82599366666666">library(ALL)</div>
<div data-canvas-width="95.72398166666665">data(ALL)</div>
<div data-canvas-width="121.82599366666666">d=exprs(ALL)</div>
<div data-canvas-width="104.4246523333333">d[1:5,1:5]</div>
<div data-canvas-width="104.4246523333333">可以看到数据集如下</div>
<p>&gt; d[1:5,1:5]</p></div>
<div>             01005    01010    03002    04006    04007</div>
<div>1000_at   7.597323 7.479445 7.567593 7.384684 7.905312</div>
<div>1001_at   5.046194 4.932537 4.799294 4.922627 4.844565</div>
<div>1002_f_at 3.900466 4.208155 3.886169 4.206798 3.416923</div>
<div>1003_s_at 5.903856 6.169024 5.860459 6.116890 5.687997</div>
<div>1004_at   5.925260 5.912780 5.893209 6.170245 5.615210</div>
<div>&gt; dim(d)</div>
<div>[1] 12625   128</div>
<div>共128个样品，12625个探针数据</div>
<div>也有文献用RNAs-seq的RPKM值矩阵来做</div>
<div>对上面这个芯片表达数据我们一般会简单的进行normalization ，然后取在各个样品差异很大的那些gene或者探针的数据来进行聚类分析</div>
<div>
<div data-canvas-width="182.73068833333346">mads=apply(d,1,mad)</div>
<div data-canvas-width="278.438065666667">d=d[rev(order(mads))[1:5000],]</div>
<p>d = sweep(d,1, apply(d,1,median,na.rm=T))</p></div>
<div>#也可以对这个d矩阵用DESeq的normalization 进行归一化，取决于具体情况</div>
<div>
<div data-canvas-width="269.7373950000003">library(ConsensusClusterPlus)</div>
<div data-canvas-width="147.9280056666667">#title=tempdir() #这里一般改为自己的目录</div>
<div data-canvas-width="147.9280056666667">title="./" #所有的图片以及数据都会输出到这里的</div>
<div data-canvas-width="617.7642216666677">results = ConsensusClusterPlus(d,maxK=6,reps=50,pItem=0.8,pFeature=1,</div>
<div data-canvas-width="713.4715990000012"> title=title,clusterAlg="hc",distance="pearson",seed=1262118388.71279,plot="png")</div>
<div data-canvas-width="713.4715990000012">这样就OK了，你指定的目录下面会输出大于9个图片</div>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/08/clipboard.png"><img class="alignnone size-full wp-image-946" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/08/clipboard.png" alt="clipboard" width="358" height="188" /></a></div>
<div>大家看看说明书就知道这个包的输出文件是什么了。</div>
<div>很多参数都是需要调整的，一般我们的maxK=6是根据实验原理来调整，如果你的样品应该是要分成6类以上，那么你就要把maxK=6调到一点。</div>
<div>查看结果results[[2]][["consensusClass"] 可以看到各个样品被分到了哪个类别里面去</div>
<div>results[[3]][["consensusClass"]</div>
<div>results[[4]][["consensusClass"] 等等</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/945.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R语言用hclust进行聚类分析</title>
		<link>http://www.bio-info-trainee.com/903.html</link>
		<comments>http://www.bio-info-trainee.com/903.html#comments</comments>
		<pubDate>Tue, 21 Jul 2015 01:09:02 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[绘图]]></category>
		<category><![CDATA[聚类]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=903</guid>
		<description><![CDATA[聚类的基础就是算出所有元素两两间的距离，我们首先做一些示例数据，如下： x=ru &#8230; <a href="http://www.bio-info-trainee.com/903.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>聚类的基础就是算出所有元素两两间的距离，我们首先做一些示例数据，如下：</p>
<p>x=runif(10)</p>
<p>y=runif(10)</p>
<p>S=cbind(x,y)                                 #得到2维的数组</p>
<p>rownames(S)=paste("Name",1:10,"")             #赋予名称，便于识别分类</p>
<p>out.dist=dist(S,method="euclidean")           #数值变距离</p>
<p>这个代码运行得到的S是一个矩阵，如下</p>
<p>&gt; S</p>
<p>x         y</p>
<p>Name 1   0.41517985 0.4697017</p>
<p>Name 2   0.35653781 0.1132367</p>
<p>Name 3   0.52253349 0.3680286</p>
<p>Name 4   0.80558684 0.9834687</p>
<p>Name 5   0.04564145 0.8560690</p>
<p>Name 6   0.11044397 0.2988598</p>
<p>Name 7   0.34984447 0.8515141</p>
<p>Name 8   0.28097709 0.1260050</p>
<p>Name 9   0.81771888 0.5976135</p>
<p>Name 10 0.40700158 0.5236567</p>
<p>可以看出里面共有10个点，它们的X,Y坐标均已知，我们有6总方法可以求矩阵</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0012.png"><img class="alignnone size-full wp-image-904" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0012.png" alt="image001" width="872" height="363" /></a></p>
<p>注释：在聚类中求两点的距离有：</p>
<p>1，绝对距离：manhattan</p>
<p>2，欧氏距离：euclidean 默认</p>
<p>3，闵科夫斯基距离：minkowski</p>
<p>4，切比雪夫距离：chebyshev</p>
<p>5，马氏距离：mahalanobis</p>
<p>6，蓝氏距离：canberra</p>
<p>用默认的算法求出距离如下</p>
<p>算出距离后就可以进行聚类啦！</p>
<p>out.hclust=hclust(out.dist,method="complete") #根据距离聚类</p>
<p>注释：聚类也有多种方法：</p>
<p>1，类平均法：average</p>
<p>2，重心法：centroid</p>
<p>3，中间距离法:median</p>
<p>4，最长距离法：complete 默认</p>
<p>5，最短距离法：single</p>
<p>6，离差平方和法：ward</p>
<p>7，密度估计法：density</p>
<p>接下来把聚类的结果图画出来</p>
<p>plclust(out.hclust)                           #对结果画图</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0033.png"><img class="alignnone size-full wp-image-905" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0033.png" alt="image003" width="765" height="477" /></a></p>
<p>rect.hclust(out.hclust,k=3)                   #用矩形画出分为3类的区域</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0052.png"><img class="alignnone size-full wp-image-906" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/07/image0052.png" alt="image005" width="783" height="500" /></a></p>
<p>out.id=cutree(out.hclust,k=3)                 #得到分为3类的数值</p>
<p>这里的out.id就是把每个点都分类了的分类数组，1,2,3.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/903.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
