<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 数据库更新</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e6%95%b0%e6%8d%ae%e5%ba%93%e6%9b%b4%e6%96%b0/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>下载最新版的KEGG信息，并且解析好</title>
		<link>http://www.bio-info-trainee.com/1188.html</link>
		<comments>http://www.bio-info-trainee.com/1188.html#comments</comments>
		<pubDate>Tue, 08 Dec 2015 12:53:41 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[KEGG]]></category>
		<category><![CDATA[数据库更新]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1188</guid>
		<description><![CDATA[打开官网：http://www.genome.jp/kegg-bin/get_h &#8230; <a href="http://www.bio-info-trainee.com/1188.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>打开官网：<a href="http://www.genome.jp/kegg-bin/get_htext?hsa00001+3101">http://www.genome.jp/kegg-bin/get_htext?hsa00001+3101</a></p>
<p><a href="http://www.genome.jp/kegg-bin/get_htext#A1">http://www.genome.jp/kegg-bin/get_htext#A1</a> （这个好像打不开）</p>
<p>可以在里面找到下载链接</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image001.png"><img class="alignnone size-full wp-image-1189" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image001.png" alt="image001" width="614" height="405" /></a></p>
<p>下载得到文本文件，可以看到里面的结构层次非常清楚，</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image002.png"><img class="alignnone size-full wp-image-1190" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image002.png" alt="image002" width="545" height="342" /></a></p>
<p>C开头的就是kegg的pathway的ID所在行，D开头的就是属于它的kegg的所有的基因</p>
<p>A,B是kegg的分类，总共是6个大类，42个小类</p>
<p><strong>grep ^A hsa00001.keg </strong></p>
<p>A&lt;b&gt;Metabolism&lt;/b&gt;</p>
<p>A&lt;b&gt;Genetic Information Processing&lt;/b&gt;</p>
<p>A&lt;b&gt;Environmental Information Processing&lt;/b&gt;</p>
<p>A&lt;b&gt;Cellular Processes&lt;/b&gt;</p>
<p>A&lt;b&gt;Organismal Systems&lt;/b&gt;</p>
<p>A&lt;b&gt;Human Diseases&lt;/b&gt;</p>
<p>也可以看到，到目前为止（2015年12月8日20:26:57），共有343个kegg的pathway信息啦</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image003.png"><img class="alignnone size-full wp-image-1191" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image003.png" alt="image003" width="768" height="301" /></a></p>
<p>接下来我们就把这个信息解析一下：</p>
<p>perl -alne '{if(/^C/){/PATH:hsa(\d+)/;$kegg=$1}else{print "$kegg\t$F[1]" if /^D/ and $kegg;}}' hsa00001.keg &gt;kegg2gene.txt</p>
<p>这样就得到了</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image004.png"><img class="alignnone size-full wp-image-1192" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image004.png" alt="image004" width="745" height="326" /></a></p>
<p>但是我发现了一个问题，有些通路竟然是没有基因的，我不是很明白为什么？</p>
<p><strong>C    04030 G protein-coupled receptors [BR:hsa04030]</strong></p>
<p><strong>C    01020 Enzyme-linked receptors [BR:hsa01020]</strong></p>
<p><strong>C    04050 Cytokine receptors [BR:hsa04050]</strong></p>
<p><strong>C    03310 Nuclear receptors [BR:hsa03310]</strong></p>
<p><strong>C    04040 Ion channels [BR:hsa04040]</strong></p>
<p><strong>C    04031 GTP-binding proteins [BR:hsa04031]</strong></p>
<p>那我们来看看kegg数据库更新的情况吧。</p>
<p>首先我们看org.Hs.eg.db这个R包里面自带的数据</p>
<p>Date for KEGG data: 2011-Mar15</p>
<p>org.Hs.egPATH has <strong>5869 entrez genes and 229 pathways</strong></p>
<p>2015年八月我用的时候是<strong> 6901 entrez genes and 295 pathways</strong></p>
<p>现在是299个通路，6992个基因</p>
<p>所以这个更新其实很缓慢的，所以大家还在用DAVID这种网络工具做kegg的富集分析结果也差不大！</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>详细信息见<a href="http://www.genome.jp/kegg/pathway.html">http://www.genome.jp/kegg/pathway.html</a></p>
<p>更新信息见：<a href="http://www.genome.jp/kegg/docs/upd_map.html">http://www.genome.jp/kegg/docs/upd_map.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1188.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
