<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; PPI</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/ppi/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用R的bioconductor里面的stringDB包来做PPI分析</title>
		<link>http://www.bio-info-trainee.com/2041.html</link>
		<comments>http://www.bio-info-trainee.com/2041.html#comments</comments>
		<pubDate>Wed, 23 Nov 2016 11:37:37 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[PPI]]></category>
		<category><![CDATA[string]]></category>
		<category><![CDATA[stringDB]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2041</guid>
		<description><![CDATA[PPI本质上是根据一系列感兴趣的蛋白质或者基因（可以是几百个甚至上千个）来去PP &#8230; <a href="http://www.bio-info-trainee.com/2041.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>PPI本质上是根据一系列感兴趣的蛋白质或者基因（可以是几百个甚至上千个）来去PPI数据库里面找到跟这系列蛋白质或者基因的相互作用关系！</p>
<div>本次的主角是stringDB，顾名思义用得是大名鼎鼎的string数据库，</div>
<div>paper见：<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383874/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383874/</a></div>
<div>主页见：<a href="http://string-db.org/cgi/input.pl">http://string-db.org/cgi/input.pl</a></div>
<div>本来还以为需要自己上传自己的基因给这个数据库去做分析，没想到他们也开发了R包，主页见： <a href="http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html">http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html</a> 而我比较喜欢用编程来解决问题，所以就学了一下这个包，非常好用！</div>
<div>它只需要一个3列的data.frame，分别是logFC,p.value,gene ID,就是标准的差异分析的结果。</div>
<div>然后用string_db$map函数给它加上一列是 string 数据库的蛋白ID，然后用string_db$add_diff_exp_color函数给它加上一列是color。</div>
<div>用string_db$plot_network函数画网络图，只需要 string 数据库的蛋白ID，如果需要给蛋白标记不同的颜色，需要用string_db$post_payload来把color对应到每个蛋白，然后再画网络图。</div>
<div><strong><span style="color: #ff0000;">也可以直接用get_interactions函数得到所有的PPI数据</span></strong>，然后写入到本地，再导入到cytoscape进行画图</div>
<div></div>
<p><span id="more-2041"></span></p>
<div>还以几个小功能，对我可能没什么用，但是比较适合初学者，仅仅根据string 数据库的蛋白ID就可以做GO/KEGG的enrichment分析啦，还可以查找两个蛋白的interaction呀，还有两个蛋白直接相互作用的paper呀，还有找某个蛋白在其它物种的同源蛋白呀！</div>
<div>软件运行中需要下载以下文件，悲催的是每次都在下载，很坑呀！因为它默认把这些文件存储在电脑的临时文件夹里面！</div>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\8fd5ba9bd7ee46a298a32da35283661b\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="653F2BA2D04145F194AFE261811B210E" /></div>
<div>所有的网络图本质上是基于iGraph的深度定制，包括后面的cluster方法，还有可能要结合cytoscape的MCODE插件来找hub基因</div>
<div><a href="http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html">基本上只需要把下面的代码运行一遍，就明白了：</a><a href="http://www.bioconductor.org/packages/release/bioc/vignettes/STRINGdb/inst/doc/STRINGdb.R">http://www.bioconductor.org/packages/release/bioc/vignettes/STRINGdb/inst/doc/STRINGdb.R</a></div>
<div></div>
<div>library(STRINGdb)</div>
<div>## 整个包不是用roxygen2来写帮助文档的，而且自己把所有函数放在了string_db对象里面，用$符合来调用各个函数，也可以查看函数的帮助文档！</div>
<div></div>
<div>## 首先选定物种及数据库的版本！</div>
<div>string_db &lt;- STRINGdb$new( version="10", species=9606,</div>
<div>score_threshold=0, input_directory="" )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 3: help</div>
<div>###################################################</div>
<div>STRINGdb$methods() # To list all the methods available.</div>
<div>STRINGdb$help("get_graph") # To visualize their documentation.</div>
<div>## 列出该包所包含的所有函数，并且可以具体查看某个函数的帮助文档。</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 4: load_data</div>
<div>###################################################</div>
<div>data(diff_exp_example1)</div>
<div>head(diff_exp_example1)</div>
<div>##一个测试数据，三列，如下：</div>
<div># pvalue logFC gene</div>
<div># 0.0001018 3.333461 VSTM2L</div>
<div># 0.0001392 3.822383 TBC1D2</div>
<div># 通常就是差异分析的结果</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 5: map</div>
<div>###################################################</div>
<div>example1_mapped &lt;- string_db$map( diff_exp_example1, "gene", removeUnmappedRows = TRUE )</div>
<div>## 因为我们的差异分析是以基因来标识的，需要map到string数据库的蛋白ID</div>
<div>STRINGdb$help("map")</div>
<div># 查看帮助文档，明白map函数如何使用，以及该函数返回的是什么！</div>
<div># 本质上就是根据输入的data.frame的gene列来查找string的蛋白ID，返回的data.frame多了一列！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 6: STRINGdb.Rnw:118-121</div>
<div>###################################################</div>
<div>options(SweaveHooks=list(fig=function()</div>
<div>par(mar=c(2.1, 0.1, 4.1, 2.1))))</div>
<div>#par(mar=c(1.1, 0.1, 4.1, 2.1))))</div>
<div>## 设置画图的属性，没什么好讲的</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 7: get_hits</div>
<div>###################################################</div>
<div>hits &lt;- example1_mapped$STRING_id[1:200]</div>
<div># 这里简单的挑选了前面的200个蛋白来进行下一步的分析！</div>
<div>## 请记住，这个例子是在随机挑选，事实上我们应该挑选自定义的差异基因</div>
<div>###################################################</div>
<div>### code chunk number 8: plot_network</div>
<div>###################################################</div>
<div>string_db$plot_network( hits )</div>
<div></div>
<div>## 只有有蛋白ID就可以进行画网络图，ID越多，耗时越长！</div>
<div>## 函数会根据输入的ID列表在string数据库里面找到所有的PPI数据，然后画网络图</div>
<div>## STRINGdb$help("plot_network")</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 9: add_diff_exp_color</div>
<div>###################################################</div>
<div># filter by p-value and add a color column</div>
<div># (i.e. green down-regulated gened and red for up-regulated genes)</div>
<div>example1_mapped_pval05 &lt;- string_db$add_diff_exp_color( subset(example1_mapped, pvalue&lt;0.05),</div>
<div>logFcColStr="logFC" )</div>
<div>## 上面简单的网络图一般不满足需求，比如我们需要定位基因的上下调关系，还有联系的紧密与否，可以用红绿色的深浅来刻画。</div>
<div>## 用add_diff_exp_color函数得到的对象还是data.frame，但是增加了一列是color</div>
<div>STRINGdb$help("add_diff_exp_color")</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 10: post_payload</div>
<div>###################################################</div>
<div># post payload information to the STRING server</div>
<div>payload_id &lt;- string_db$post_payload( example1_mapped_pval05$STRING_id,</div>
<div>colors=example1_mapped_pval05$color )</div>
<div></div>
<div>## 前面add_diff_exp_color函数为我们的data.frame增加了一列是color，还需要用post_payload函数来把string的蛋白ID跟color对应成功，返回一个payload_id对象给画图函数。</div>
<div>STRINGdb$help("post_payload")</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 11: plot_halo_network</div>
<div>###################################################</div>
<div># display a STRING network png with the "halo"</div>
<div>string_db$plot_network( hits, payload_id=payload_id )</div>
<div></div>
<div>## 同样是画网络图，但是增加了一个color的属性。</div>
<div>## 可以看出来，基因太多了，画的图其实很拥挤</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 13: plot_ppi_enrichment</div>
<div>###################################################</div>
<div># plot the enrichment for the best 1000 genes</div>
<div>string_db$plot_ppi_enrichment( example1_mapped$STRING_id[1:1000], quiet=TRUE )</div>
<div>STRINGdb$help("plot_ppi_enrichment")</div>
<div>## 这个代码我没有看懂在干吗</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 14: enrichment</div>
<div>###################################################</div>
<div>enrichmentGO &lt;- string_db$get_enrichment( hits, category = "Process", methodMT = "fdr", iea = TRUE )</div>
<div>enrichmentKEGG &lt;- string_db$get_enrichment( hits, category = "KEGG", methodMT = "fdr", iea = TRUE )</div>
<div>head(enrichmentGO, n=7)</div>
<div>head(enrichmentKEGG, n=7)</div>
<div>### 直接根据 string 数据库的蛋白ID来做富集分析，此函数会自动下载一些数据。默认是以人类的蛋白库作为背景，但是大部分情况下是需要改变的，否则P值就算的不准确啦</div>
<div></div>
<div>#################################################</div>
<div># code chunk number 15: background (eval = FALSE)</div>
<div>#################################################</div>
<div># 这里修改背景值，人类本来有两万多个基因，这里变成只有2000个了</div>
<div>backgroundV &lt;- example1_mapped$STRING_id[1:2000] # as an example, we use the first 2000 genes</div>
<div>string_db$set_background(backgroundV)</div>
<div>## string_db 是一个全局变量，之前是直接选择人类的V10.0版本，现在被修改了，只是做一个测试，一定要记得改回去！！！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 16: new_background_inst (eval = FALSE)</div>
<div>###################################################</div>
<div>string_db &lt;- STRINGdb$new( score_threshold=0, backgroundV = backgroundV )</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 17: enrichmentHeatmap (eval = FALSE)</div>
<div>###################################################</div>
<div>eh &lt;- string_db$enrichment_heatmap( list( hits[1:100], hits[101:200]),</div>
<div>list("list1","list2"), title="My Lists" )</div>
<div></div>
<div>## 我们还是把 string_db 修改回来吧！</div>
<div>string_db &lt;- STRINGdb$new( version="10", species=9606,</div>
<div>score_threshold=0, input_directory="" )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 18: clustering1</div>
<div>###################################################</div>
<div># get clusters</div>
<div>clustersList &lt;- string_db$get_clusters(example1_mapped$STRING_id[1:600])</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 19: STRINGdb.Rnw:254-256</div>
<div>###################################################</div>
<div>options(SweaveHooks=list(fig=function()</div>
<div>par(mar=c(2.1, 0.1, 4.1, 2.1))))</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 20: clustering2</div>
<div>###################################################</div>
<div># plot first 4 clusters</div>
<div>par(mfrow=c(2,2))</div>
<div>for(i in seq(1:4)){</div>
<div>string_db$plot_network(clustersList[[i]])</div>
<div>}</div>
<div>## 把4个cluster画在同一个画布上面！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 21: proteins</div>
<div>###################################################</div>
<div>string_proteins &lt;- string_db$get_proteins()</div>
<div></div>
<div>## 下面是一下其它小工具，比如找两个蛋白的interaction呀，还有两个蛋白直接相互作用的paper呀，还有找某个蛋白在其它物种的同源蛋白呀！</div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 22: atmtp</div>
<div>###################################################</div>
<div>tp53 = string_db$mp( "tp53" )</div>
<div>atm = string_db$mp( "atm" )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 23: neighbors (eval = FALSE)</div>
<div>###################################################</div>
<div>## string_db$get_neighbors( c(tp53, atm) )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 24: interactions</div>
<div>###################################################</div>
<div>string_db$get_interactions( c(tp53, atm) )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 25: pubmedInteractions (eval = FALSE)</div>
<div>###################################################</div>
<div>## string_db$get_pubmed_interaction( tp53, atm )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 26: homologs (eval = FALSE)</div>
<div>###################################################</div>
<div>## # get the reciprocal best hits of the following protein in all the STRING species</div>
<div>## string_db$get_homologs_besthits(tp53, symbets = TRUE)</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 27: homologs2 (eval = FALSE)</div>
<div>###################################################</div>
<div>## # get the homologs of the following two proteins in the mouse (i.e. species_id=10090)</div>
<div>## string_db$get_homologs(c(tp53, atm), target_species_id=10090, bitscore_threshold=60 )</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 28: benchmark1</div>
<div>###################################################</div>
<div>data(interactions_example)</div>
<div></div>
<div>interactions_benchmark = string_db$benchmark_ppi(interactions_example, pathwayType = "KEGG",</div>
<div>max_homology_bitscore = 60, precision_window = 400, exclude_pathways = "blacklist")</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 29: STRINGdb.Rnw:391-393</div>
<div>###################################################</div>
<div>options(SweaveHooks=list(fig=function()</div>
<div>par(mar=c(4.1, 4.1, 4.1, 2.1))))</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 30: benchmark2</div>
<div>###################################################</div>
<div>plot(interactions_benchmark$precision, ylim=c(0,1), type="l", xlim=c(0,700),</div>
<div>xlab="interactions", ylab="precision")</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 31: benchmark3</div>
<div>###################################################</div>
<div>interactions_pathway_view = string_db$benchmark_ppi_pathway_view(interactions_benchmark, precision_threshold=0.2, pathwayType = "KEGG")</div>
<div>head(interactions_pathway_view)</div>
<div></div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2041.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>下载最新的蛋白相互作用数据库-STRING</title>
		<link>http://www.bio-info-trainee.com/1589.html</link>
		<comments>http://www.bio-info-trainee.com/1589.html#comments</comments>
		<pubDate>Thu, 28 Apr 2016 12:02:47 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[PPI]]></category>
		<category><![CDATA[string]]></category>
		<category><![CDATA[蛋白相互作用]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1589</guid>
		<description><![CDATA[string数据库是PPI领域里面最完备已经最受欢迎的数据库了。如果直接在谷歌里 &#8230; <a href="http://www.bio-info-trainee.com/1589.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>string数据库是PPI领域里面最完备已经最受欢迎的数据库了。如果直接在谷歌里面搜索PPI，映入眼帘就是string的官网，它们的主页现在是html5啦，比较精美： <a href="http://string-db.org/">http://string-db.org/</a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/11.png"><img class="alignnone size-full wp-image-1590" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/11.png" alt="1" width="425" height="342" /></a></p>
<p>写的很霸气，近两亿的记录，不过一般大家只会关心一个物种，比如人，其实还不到一千万！</p>
<p>我们直接进入下载界面，找到人类的数据，人类的物种ID是9606.</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/21.png"><img class="alignnone size-full wp-image-1591" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/04/21.png" alt="2" width="316" height="310" /></a></p>
<p>需要一定许可才能下载完整版本，我这里测试最上面那个公开版本数据！</p>
<p>数据很简单，就是protein+protein+score，共八百多万行记录，记录着string数据库搜集的所有可能以及可信的蛋白相互作用！但是它的蛋白ID是ENSEMBL的ID，所以需要转换成基因的ID，才能被大多数人使用，因为大家的研究单位一般是基因，所以蛋白相互作用略等于基因相互作用。</p>
<p>基因ID转换，我推荐用org.Hs.eg.db这个R的包，很容易就可以实现的！</p>
<table class="GEM3DMTCOFB ace_text-layer ace_line GEM3DMTCKT" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td align="left">
<blockquote>
<pre id="rstudio_console_output" class="GEM3DMTCFGB" tabindex="0"><span class="GEM3DMTCLGB ace_keyword">&gt; </span><span class="GEM3DMTCLFB ace_keyword">tmp=toTable(org.Hs.egENSEMBLPROT)
</span><span class="GEM3DMTCLGB ace_keyword">&gt; </span><span class="GEM3DMTCLFB ace_keyword">dim(tmp)
</span>[1] 110916      2
<span class="GEM3DMTCLGB ace_keyword">&gt; </span><span class="GEM3DMTCLFB ace_keyword">head(tmp)
</span>  gene_id         prot_id
1       1 ENSP00000263100
2       1 ENSP00000470909
3       2 ENSP00000443302
4       2 ENSP00000323929
5       2 ENSP00000438599
6       2 ENSP00000445717
</pre>
</blockquote>
</td>
</tr>
<tr>
<td align="left"></td>
</tr>
<tr>
<td align="left">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td rowspan="1" align="left" width="1" height="">
<div class="GEM3DMTCLGB ace_keyword">&gt;</div>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>有约500多个蛋白ID是无法转换成对应的基因的，这个很正常，毕竟这种ID本来就不稳定，很多用着用着就失效了！</p>
<p>转换好之后就可以上传到数据库啦，然后可以供其它可视化或者分析程序使用！</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1589.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>蛋白质相互作用（PPI）数据库大全</title>
		<link>http://www.bio-info-trainee.com/1340.html</link>
		<comments>http://www.bio-info-trainee.com/1340.html#comments</comments>
		<pubDate>Thu, 14 Jan 2016 12:09:23 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[PPI]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1340</guid>
		<description><![CDATA[最近遇到一个项目需要探究一个gene list里面的基因直接的联系，所以就想到了 &#8230; <a href="http://www.bio-info-trainee.com/1340.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>最近遇到一个项目需要探究一个gene list里面的基因直接的联系，所以就想到了基因的产物蛋白的相互作用关系数据库，发现这些数据库好多好多！</div>
<div>一个比较综合的链接是：A compendium of PPI databases can be found in <a href="http://www.pathguide.org/">http://www.pathguide.org/</a>.</p>
<div>里面的数据库非常多，仅仅是对于人类就有</div>
<div>
<p>Your search returned <b>207</b> results in <b>9</b> categories with the following search parameters:</p>
<ul>
<li>Organisms: <a href="http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606">Homo sapiens (Human)</a></li>
<li>Availability: Free to all users</li>
<li>Standards: all</li>
</ul>
</div>
<div><span style="color: #333333; font-family: arial;">人类的六个主要PPI是：</span>Analysis of human interactome PPI data showing the coverage of six major primary databases (<b>BIND, BioGRID, DIP, HPRD, IntAct, and MINT)</b>, according to the integration provided by the meta-database APID.</div>
<div>
<table border="1" width="693" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td>BIND</td>
<td>the biomolecular interaction network database</td>
<td>died link</td>
</tr>
<tr>
<td>DIP</td>
<td>the database of interacting proteins</td>
<td><a href="http://dip.doe-mbi.ucla.edu/">http://dip.doe-mbi.ucla.edu/ </a></td>
</tr>
<tr>
<td>MINT</td>
<td>the molecular interaction database</td>
<td><a href="http://mint.bio.uniroma2.it/mint/">http://mint.bio.uniroma2.it/mint/ </a></td>
</tr>
<tr>
<td>STRING</td>
<td>Search Tool for the Retrieval of Interacting Genes/Proteins</td>
<td><a href="http://string-db.org/">http://string-db.org/  </a></td>
</tr>
<tr>
<td>HPRO</td>
<td>Human protein reference database</td>
<td><a href="http://www.hprd.org/">http://www.hprd.org/ </a></td>
</tr>
<tr>
<td>BioGRID</td>
<td>The Biological General Repository for Interaction Datasets</td>
<td><a href="http://thebiogrid.org/">http://thebiogrid.org/ </a></td>
</tr>
</tbody>
</table>
</div>
<div><span style="color: #333333; font-family: arial;">这些数据库大部分都还有维护者，还在持续更新，每次更新都可以发一篇paper，而数据库收集的paper引用一般都上千，如果你做了一个数据库，才十几个人引用，那就说明你是自己在跟自己玩。</span></div>
<div><span style="color: #333333; font-family: arial;">见：<a href="http://openwetware.org/wiki/Protein-protein_interaction_databases">http://openwetware.org/wiki/Protein-protein_interaction_databases</a></span></div>
<div><span style="color: #333333; font-family: arial;">其中比较好用的是</span>宾夕法尼亚州匹兹堡的大学的一个：<a href="http://severus.dbmi.pitt.edu/wiki-pi/">http://severus.dbmi.pitt.edu/wiki-pi/</a></div>
<div><span style="color: #333333; font-family: arial;"><a href="http://severus.dbmi.pitt.edu/wiki-pi/index.php/pair/view/3838/7157">http://severus.dbmi.pitt.edu/wiki-pi/index.php/pair/view/3838/7157</a></span></div>
<div></div>
<div>(a) <em>PPI definition</em>; a definition of a protein-to-protein interaction compared to other biomolecular relationships or associations.</div>
<div>(b)<em>PPI determination by two alternative approaches: binary and co-complex</em>; a description of the PPIs determined by the two main types of experimental technologies.</div>
<div>(c) <em>The main databases and repositories that include PPIs</em>; a description and comparison of the main databases and repositories that include PPIs, indicating the type of data that they collect with a special distinction between experimental and predicted data.</div>
<div>(d) <em>Analysis of coverage and ways to improve PPI reliability</em>; a comparative study of the current coverage on PPIs and presentation of some strategies to improve the reliability of PPI data.</div>
<div>(e) <em>Networks derived from PPIs compared to canonical pathways</em>; a practical example that compares the characteristics and information provided by a canonical pathway and the PPI network built for the same proteins. Last, a short summary and guidance for learning more is provided.</div>
<div></div>
<div>现在的蛋白质相互作用数据库的数据都很有限，但是在持续增长，一般有下面四种原因导致数据被收录到数据库</div>
<div>There are four common approaches for PPI data expansions:</div>
<div>1) manual curation from the biomedical literature by experts;</div>
<div>2) automated PPI data extraction from biomedical literature with text mining methods;</div>
<div>3) computational inference based on interacting protein domains or co-regulation relationships, often derived from data in model organisms; and</div>
<div>4) data integration from various experimental or computational sources.</div>
<div>Partly due to the difficulty of evaluating qualities for PPI data, a majority of widely-used PPI databases, including DIP, BIND, MINT, HPRD, and IntAct, take a "conservative approach" to PPI data expansion by<span class="Apple-converted-space"> </span><b>adding only manually curated interactions.</b><span class="Apple-converted-space"> </span>Therefore, the coverage of the protein interactome developed using this approach is poor.</div>
<div></div>
<div>In the second literature mining approach, computer software replaces database curators to extract protein interaction (or, association) data from large volumes of biomedical literature . Due to the complexity of natural language processing techniques involved, however, this approach often generates large amount of false positive protein "associations" that are not truly biologically significant "interactions".</div>
<div></div>
<div>The challenge for the integrative approach is<span class="Apple-converted-space"> </span><b>how to balance quality with coverage.</b></div>
<div>In particular, different databases may contain many redundant PPI information derived from the same sources, while the overlaps between independently derived PPI data sets are quite low .</div>
<div></div>
<div><span style="color: #333333; font-family: arial;">参考：</span></div>
<div><span style="color: #333333; font-family: arial;">2009年发表的HIPPI数据库：<a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR6_2544">http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR6_2544</a><span class="Apple-converted-space"> </span>（是对</span>HPRD [<a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR11_2544">11</a>], BIND [<a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR20_2544">20</a>], MINT [<a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR21_2544">21</a>], STRING [<a href="http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR26_2544">26</a>], and OPHID数据库的整合）</div>
<div>2010年的综述：<span style="color: #333333; font-family: arial;"><a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000807">http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000807</a></span></div>
<div><span style="color: #333333; font-family: arial;"><a href="http://bib.oxfordjournals.org/content/early/2010/09/16/bib.bbq064.full">http://bib.oxfordjournals.org/content/early/2010/09/16/bib.bbq064.full</a></span></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1340.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
