<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; stringDB</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/stringdb/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用R的bioconductor里面的stringDB包来做PPI分析</title>
		<link>http://www.bio-info-trainee.com/2041.html</link>
		<comments>http://www.bio-info-trainee.com/2041.html#comments</comments>
		<pubDate>Wed, 23 Nov 2016 11:37:37 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[PPI]]></category>
		<category><![CDATA[string]]></category>
		<category><![CDATA[stringDB]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2041</guid>
		<description><![CDATA[PPI本质上是根据一系列感兴趣的蛋白质或者基因（可以是几百个甚至上千个）来去PP &#8230; <a href="http://www.bio-info-trainee.com/2041.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>PPI本质上是根据一系列感兴趣的蛋白质或者基因（可以是几百个甚至上千个）来去PPI数据库里面找到跟这系列蛋白质或者基因的相互作用关系！</p>
<div>本次的主角是stringDB，顾名思义用得是大名鼎鼎的string数据库，</div>
<div>paper见：<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383874/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383874/</a></div>
<div>主页见：<a href="http://string-db.org/cgi/input.pl">http://string-db.org/cgi/input.pl</a></div>
<div>本来还以为需要自己上传自己的基因给这个数据库去做分析，没想到他们也开发了R包，主页见： <a href="http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html">http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html</a> 而我比较喜欢用编程来解决问题，所以就学了一下这个包，非常好用！</div>
<div>它只需要一个3列的data.frame，分别是logFC,p.value,gene ID,就是标准的差异分析的结果。</div>
<div>然后用string_db$map函数给它加上一列是 string 数据库的蛋白ID，然后用string_db$add_diff_exp_color函数给它加上一列是color。</div>
<div>用string_db$plot_network函数画网络图，只需要 string 数据库的蛋白ID，如果需要给蛋白标记不同的颜色，需要用string_db$post_payload来把color对应到每个蛋白，然后再画网络图。</div>
<div><strong><span style="color: #ff0000;">也可以直接用get_interactions函数得到所有的PPI数据</span></strong>，然后写入到本地，再导入到cytoscape进行画图</div>
<div></div>
<p><span id="more-2041"></span></p>
<div>还以几个小功能，对我可能没什么用，但是比较适合初学者，仅仅根据string 数据库的蛋白ID就可以做GO/KEGG的enrichment分析啦，还可以查找两个蛋白的interaction呀，还有两个蛋白直接相互作用的paper呀，还有找某个蛋白在其它物种的同源蛋白呀！</div>
<div>软件运行中需要下载以下文件，悲催的是每次都在下载，很坑呀！因为它默认把这些文件存储在电脑的临时文件夹里面！</div>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\8fd5ba9bd7ee46a298a32da35283661b\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="653F2BA2D04145F194AFE261811B210E" /></div>
<div>所有的网络图本质上是基于iGraph的深度定制，包括后面的cluster方法，还有可能要结合cytoscape的MCODE插件来找hub基因</div>
<div><a href="http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html">基本上只需要把下面的代码运行一遍，就明白了：</a><a href="http://www.bioconductor.org/packages/release/bioc/vignettes/STRINGdb/inst/doc/STRINGdb.R">http://www.bioconductor.org/packages/release/bioc/vignettes/STRINGdb/inst/doc/STRINGdb.R</a></div>
<div></div>
<div>library(STRINGdb)</div>
<div>## 整个包不是用roxygen2来写帮助文档的，而且自己把所有函数放在了string_db对象里面，用$符合来调用各个函数，也可以查看函数的帮助文档！</div>
<div></div>
<div>## 首先选定物种及数据库的版本！</div>
<div>string_db &lt;- STRINGdb$new( version="10", species=9606,</div>
<div>score_threshold=0, input_directory="" )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 3: help</div>
<div>###################################################</div>
<div>STRINGdb$methods() # To list all the methods available.</div>
<div>STRINGdb$help("get_graph") # To visualize their documentation.</div>
<div>## 列出该包所包含的所有函数，并且可以具体查看某个函数的帮助文档。</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 4: load_data</div>
<div>###################################################</div>
<div>data(diff_exp_example1)</div>
<div>head(diff_exp_example1)</div>
<div>##一个测试数据，三列，如下：</div>
<div># pvalue logFC gene</div>
<div># 0.0001018 3.333461 VSTM2L</div>
<div># 0.0001392 3.822383 TBC1D2</div>
<div># 通常就是差异分析的结果</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 5: map</div>
<div>###################################################</div>
<div>example1_mapped &lt;- string_db$map( diff_exp_example1, "gene", removeUnmappedRows = TRUE )</div>
<div>## 因为我们的差异分析是以基因来标识的，需要map到string数据库的蛋白ID</div>
<div>STRINGdb$help("map")</div>
<div># 查看帮助文档，明白map函数如何使用，以及该函数返回的是什么！</div>
<div># 本质上就是根据输入的data.frame的gene列来查找string的蛋白ID，返回的data.frame多了一列！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 6: STRINGdb.Rnw:118-121</div>
<div>###################################################</div>
<div>options(SweaveHooks=list(fig=function()</div>
<div>par(mar=c(2.1, 0.1, 4.1, 2.1))))</div>
<div>#par(mar=c(1.1, 0.1, 4.1, 2.1))))</div>
<div>## 设置画图的属性，没什么好讲的</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 7: get_hits</div>
<div>###################################################</div>
<div>hits &lt;- example1_mapped$STRING_id[1:200]</div>
<div># 这里简单的挑选了前面的200个蛋白来进行下一步的分析！</div>
<div>## 请记住，这个例子是在随机挑选，事实上我们应该挑选自定义的差异基因</div>
<div>###################################################</div>
<div>### code chunk number 8: plot_network</div>
<div>###################################################</div>
<div>string_db$plot_network( hits )</div>
<div></div>
<div>## 只有有蛋白ID就可以进行画网络图，ID越多，耗时越长！</div>
<div>## 函数会根据输入的ID列表在string数据库里面找到所有的PPI数据，然后画网络图</div>
<div>## STRINGdb$help("plot_network")</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 9: add_diff_exp_color</div>
<div>###################################################</div>
<div># filter by p-value and add a color column</div>
<div># (i.e. green down-regulated gened and red for up-regulated genes)</div>
<div>example1_mapped_pval05 &lt;- string_db$add_diff_exp_color( subset(example1_mapped, pvalue&lt;0.05),</div>
<div>logFcColStr="logFC" )</div>
<div>## 上面简单的网络图一般不满足需求，比如我们需要定位基因的上下调关系，还有联系的紧密与否，可以用红绿色的深浅来刻画。</div>
<div>## 用add_diff_exp_color函数得到的对象还是data.frame，但是增加了一列是color</div>
<div>STRINGdb$help("add_diff_exp_color")</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 10: post_payload</div>
<div>###################################################</div>
<div># post payload information to the STRING server</div>
<div>payload_id &lt;- string_db$post_payload( example1_mapped_pval05$STRING_id,</div>
<div>colors=example1_mapped_pval05$color )</div>
<div></div>
<div>## 前面add_diff_exp_color函数为我们的data.frame增加了一列是color，还需要用post_payload函数来把string的蛋白ID跟color对应成功，返回一个payload_id对象给画图函数。</div>
<div>STRINGdb$help("post_payload")</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 11: plot_halo_network</div>
<div>###################################################</div>
<div># display a STRING network png with the "halo"</div>
<div>string_db$plot_network( hits, payload_id=payload_id )</div>
<div></div>
<div>## 同样是画网络图，但是增加了一个color的属性。</div>
<div>## 可以看出来，基因太多了，画的图其实很拥挤</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 13: plot_ppi_enrichment</div>
<div>###################################################</div>
<div># plot the enrichment for the best 1000 genes</div>
<div>string_db$plot_ppi_enrichment( example1_mapped$STRING_id[1:1000], quiet=TRUE )</div>
<div>STRINGdb$help("plot_ppi_enrichment")</div>
<div>## 这个代码我没有看懂在干吗</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 14: enrichment</div>
<div>###################################################</div>
<div>enrichmentGO &lt;- string_db$get_enrichment( hits, category = "Process", methodMT = "fdr", iea = TRUE )</div>
<div>enrichmentKEGG &lt;- string_db$get_enrichment( hits, category = "KEGG", methodMT = "fdr", iea = TRUE )</div>
<div>head(enrichmentGO, n=7)</div>
<div>head(enrichmentKEGG, n=7)</div>
<div>### 直接根据 string 数据库的蛋白ID来做富集分析，此函数会自动下载一些数据。默认是以人类的蛋白库作为背景，但是大部分情况下是需要改变的，否则P值就算的不准确啦</div>
<div></div>
<div>#################################################</div>
<div># code chunk number 15: background (eval = FALSE)</div>
<div>#################################################</div>
<div># 这里修改背景值，人类本来有两万多个基因，这里变成只有2000个了</div>
<div>backgroundV &lt;- example1_mapped$STRING_id[1:2000] # as an example, we use the first 2000 genes</div>
<div>string_db$set_background(backgroundV)</div>
<div>## string_db 是一个全局变量，之前是直接选择人类的V10.0版本，现在被修改了，只是做一个测试，一定要记得改回去！！！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 16: new_background_inst (eval = FALSE)</div>
<div>###################################################</div>
<div>string_db &lt;- STRINGdb$new( score_threshold=0, backgroundV = backgroundV )</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 17: enrichmentHeatmap (eval = FALSE)</div>
<div>###################################################</div>
<div>eh &lt;- string_db$enrichment_heatmap( list( hits[1:100], hits[101:200]),</div>
<div>list("list1","list2"), title="My Lists" )</div>
<div></div>
<div>## 我们还是把 string_db 修改回来吧！</div>
<div>string_db &lt;- STRINGdb$new( version="10", species=9606,</div>
<div>score_threshold=0, input_directory="" )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 18: clustering1</div>
<div>###################################################</div>
<div># get clusters</div>
<div>clustersList &lt;- string_db$get_clusters(example1_mapped$STRING_id[1:600])</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 19: STRINGdb.Rnw:254-256</div>
<div>###################################################</div>
<div>options(SweaveHooks=list(fig=function()</div>
<div>par(mar=c(2.1, 0.1, 4.1, 2.1))))</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 20: clustering2</div>
<div>###################################################</div>
<div># plot first 4 clusters</div>
<div>par(mfrow=c(2,2))</div>
<div>for(i in seq(1:4)){</div>
<div>string_db$plot_network(clustersList[[i]])</div>
<div>}</div>
<div>## 把4个cluster画在同一个画布上面！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 21: proteins</div>
<div>###################################################</div>
<div>string_proteins &lt;- string_db$get_proteins()</div>
<div></div>
<div>## 下面是一下其它小工具，比如找两个蛋白的interaction呀，还有两个蛋白直接相互作用的paper呀，还有找某个蛋白在其它物种的同源蛋白呀！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 22: atmtp</div>
<div>###################################################</div>
<div>tp53 = string_db$mp( "tp53" )</div>
<div>atm = string_db$mp( "atm" )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 23: neighbors (eval = FALSE)</div>
<div>###################################################</div>
<div>## string_db$get_neighbors( c(tp53, atm) )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 24: interactions</div>
<div>###################################################</div>
<div>string_db$get_interactions( c(tp53, atm) )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 25: pubmedInteractions (eval = FALSE)</div>
<div>###################################################</div>
<div>## string_db$get_pubmed_interaction( tp53, atm )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 26: homologs (eval = FALSE)</div>
<div>###################################################</div>
<div>## # get the reciprocal best hits of the following protein in all the STRING species</div>
<div>## string_db$get_homologs_besthits(tp53, symbets = TRUE)</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 27: homologs2 (eval = FALSE)</div>
<div>###################################################</div>
<div>## # get the homologs of the following two proteins in the mouse (i.e. species_id=10090)</div>
<div>## string_db$get_homologs(c(tp53, atm), target_species_id=10090, bitscore_threshold=60 )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 28: benchmark1</div>
<div>###################################################</div>
<div>data(interactions_example)</div>
<div></div>
<div>interactions_benchmark = string_db$benchmark_ppi(interactions_example, pathwayType = "KEGG",</div>
<div>max_homology_bitscore = 60, precision_window = 400, exclude_pathways = "blacklist")</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 29: STRINGdb.Rnw:391-393</div>
<div>###################################################</div>
<div>options(SweaveHooks=list(fig=function()</div>
<div>par(mar=c(4.1, 4.1, 4.1, 2.1))))</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 30: benchmark2</div>
<div>###################################################</div>
<div>plot(interactions_benchmark$precision, ylim=c(0,1), type="l", xlim=c(0,700),</div>
<div>xlab="interactions", ylab="precision")</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 31: benchmark3</div>
<div>###################################################</div>
<div>interactions_pathway_view = string_db$benchmark_ppi_pathway_view(interactions_benchmark, precision_threshold=0.2, pathwayType = "KEGG")</div>
<div>head(interactions_pathway_view)</div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2041.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
