<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 计算机基础</title>
	<atom:link href="http://www.bio-info-trainee.com/category/%e8%ae%a1%e7%ae%97%e6%9c%ba%e5%9f%ba%e7%a1%80/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>导入的ubuntu源被服务器拒绝怎么办？</title>
		<link>http://www.bio-info-trainee.com/2378.html</link>
		<comments>http://www.bio-info-trainee.com/2378.html#comments</comments>
		<pubDate>Fri, 24 Mar 2017 03:13:36 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[计算机基础]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2378</guid>
		<description><![CDATA[很久很久以前我就写过一个服务器系列教程：http://www.bio-info- &#8230; <a href="http://www.bio-info-trainee.com/2378.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>很久很久以前我就写过一个服务器系列教程：http://www.bio-info-trainee.com/555.html<br />
在那里，我留下了一个疑问，因为后来没有机会再继续捣鼓服务器，所以一直悬而未决，问题描述如下：<span id="more-2378"></span></p>
<p>如果你导入的R源被你的服务器拒绝，你就惨了<br />
The following signatures couldn't be verified because the public key is not<br />
以下签名不能因为公钥未验证~~</p>
<p>因为ubuntu对生信菜鸟来说是最好用的linux服务器，没有之一，因为它有apt-get。<br />
比如安装R语言，我只需要把厦门大学或者北京大学的R源添加到apt-get的源文件里面就可以用apt-get来自动下载安装了。</p>
<p>如果，你添加的源，不被你的服务器认可，你就惨了，但还是可以解决的：</p>
<p>http://askubuntu.com/questions/13065/how-do-i-fix-the-gpg-error-no-pubkey</p>
<p>比如我在/etc/apt/sources.list文件最下面，添加了厦门大学的Ubuntu 16.04 LTS对应的R语言源;<br />
deb http://mirrors.xmu.edu.cn/CRAN/bin/linux/ubuntu/ xenial/<br />
接下来sudo apt-get update # 更新源就遇到了这个问题。</p>
<p>ubuntu@ip-172-31-2-206:~$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9<br />
Executing: /tmp/tmp.QcuTMmu82U/gpg.1.sh --keyserver<br />
keyserver.ubuntu.com<br />
--recv-keys<br />
51716619E084DAB9<br />
gpg: requesting key E084DAB9 from hkp server keyserver.ubuntu.com<br />
gpg: key E084DAB9: public key "Michael Rutter &lt;marutter@gmail.com&gt;" imported<br />
gpg: Total number processed: 1<br />
gpg: imported: 1 (RSA: 1)<br />
ubuntu@ip-172-31-2-206:~$ sudo apt-get update<br />
Hit:1 http://us-west-2.ec2.archive.ubuntu.com/ubuntu xenial InRelease<br />
Hit:2 http://us-west-2.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease<br />
Hit:3 http://us-west-2.ec2.archive.ubuntu.com/ubuntu xenial-backports InRelease<br />
Get:4 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]<br />
Hit:5 http://ppa.launchpad.net/webupd8team/y-ppa-manager/ubuntu xenial InRelease<br />
Get:6 http://mirrors.xmu.edu.cn/CRAN/bin/linux/ubuntu xenial/ InRelease [3,590 B]<br />
Fetched 106 kB in 0s (209 kB/s)<br />
Reading package lists... Done<br />
ubuntu@ip-172-31-2-206:~$</p>
<p>完美解决啦~<br />
~ sudo apt-get install r-base-core # 再次安装R语言软件包<br />
~ R –version # 检查R的版本<br />
安装过程非常慢，可能得好几个小时。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2017/03/ES-ubuntu16-install-R.png"><img class="alignnone size-full wp-image-2380" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/03/ES-ubuntu16-install-R.png" alt="es-ubuntu16-install-r" width="774" height="198" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2378.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>我用rmarkdown写过的教程</title>
		<link>http://www.bio-info-trainee.com/2372.html</link>
		<comments>http://www.bio-info-trainee.com/2372.html#comments</comments>
		<pubDate>Wed, 15 Mar 2017 09:16:05 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[DESeq2]]></category>
		<category><![CDATA[GEOquery]]></category>
		<category><![CDATA[limma]]></category>
		<category><![CDATA[rmarkdown]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2372</guid>
		<description><![CDATA[用rmarkdown写教程真心非常方便，尤其是R语言相关的，比如一些R包的应用， &#8230; <a href="http://www.bio-info-trainee.com/2372.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>用rmarkdown写教程真心非常方便，尤其是R语言相关的，比如一些R包的应用，或者一些可视化，或者一些统计，下面我简单列出一些我以前写过的，图文并茂，关键是还非常省心，不需要排版，不需要上传图片，整理图片。</div>
<p>一般来说看链接最后的文件名就知道这篇文章讲的是什么了：</p>
<div>首先是几个R包的讲解：<br />
<a href="http://www.bio-info-trainee.com/bioconductor_China/software/limma.html" target="_blank">http://www.bio-info-trainee.com/ ... software/limma.html</a><br />
<a href="http://www.bio-info-trainee.com/bioconductor_China/software/DESeq2.html" target="_blank">http://www.bio-info-trainee.com/ ... oftware/DESeq2.html</a><br />
<a href="http://www.bio-info-trainee.com/bioconductor_China/software/GEOquery.html" target="_blank">http://www.bio-info-trainee.com/ ... tware/GEOquery.html</a><br />
<a href="http://www.bio-info-trainee.com/bioconductor_China/software/limma_voom.html" target="_blank">http://www.bio-info-trainee.com/ ... are/limma_voom.html</a><br />
当然，一些并不是bioconductor的包我也会写教程， 偶尔：<br />
<a href="http://www.bio-info-trainee.com/bioconductor_China/software/GOplot.html" target="_blank">http://www.bio-info-trainee.com/ ... oftware/GOplot.html</a><br />
<a href="http://www.bio-info-trainee.com/bioconductor_China/software/Rcircos.html" target="_blank">http://www.bio-info-trainee.com/ ... ftware/Rcircos.html</a></div>
<p><span id="more-2372"></span></p>
<div></div>
<div>下面是一个统计学里面的逻辑分析的讲解</div>
<div><a href="http://www.bio-info-trainee.com/tmp/tutorial_for_logical_analysis.html">http://www.bio-info-trainee.com/tmp/tutorial_for_logical_analysis.html</a></div>
<div>下面是一个表达矩阵的15个常见的可视化图形的制作：</div>
<div><a href="http://bio-info-trainee.com/tmp/basic_visualization_for_expression_matrix.html">http://bio-info-trainee.com/tmp/basic_visualization_for_expression_matrix.html</a></div>
<div></div>
<div>
<h1 class="title toc-ignore">用deconstructSigs来做cosmic的mutation signature图</h1>
</div>
<div><a href="http://biotrainee.com/jmzeng/markdown/deconstuctSigs.html" target="_blank">http://biotrainee.com/jmzeng/markdown/deconstuctSigs.html</a></div>
<div></div>
<div>这个史上最全方差分析，不是我写的，但是写的很赞，我就不多此一举了：</div>
<div><a href="http://biotrainee.com/jmzeng/markdown/ANOVA.html" target="_blank">http://biotrainee.com/jmzeng/markdown/ANOVA.html  </a>推荐大家看看</div>
<div></div>
<div>
<h1 class="title toc-ignore">标准的基因检测报告目录  <a href="http://www.biotrainee.com/jmzeng/blogMyGenome/name_introduction.html" target="_blank">http://www.biotrainee.com/jmzeng/blogMyGenome/name_introduction.html</a></h1>
</div>
<div></div>
<div></div>
<div></div>
<h1><strong><span style="color: #ff0000;">下面是一堆高通量测序分析的结题报告：</span></strong></h1>
<div></div>
<div> 简单 <span style="color: #6e8b3d;">RNA-seq</span> 项目结题报告</div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Ref_RNAseq_result/index.html" target="_blank">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Ref_RNAseq_result/index.html</a></div>
<div></div>
<div>
<div>16s rDNA 高变区测序 项目结题报告</div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/16sRNA/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/16sRNA/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/16sRNA/index.html">示范 宏基因组分析 结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/MetaGenome_result/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/MetaGenome_result/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/MetaGenome_result/index.html">示范 细菌基因组分析 结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Pacbio_Genome_result/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Pacbio_Genome_result/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Pacbio_Genome_result/index.html">示范 小RNA 项目结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/SmallRNA_result/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/SmallRNA_result/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/SmallRNA_result/index.html">示范 lncRNA 项目结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/lncRNA_result/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/lncRNA_result/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/lncRNA_result/index.html">示范ChIP-Seq结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/chip-report/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/chip-report/index.html</a></div>
<div></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/chip-report/index.html">示范 转录组测序（De novo） 项目结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Denovo_transcriptome/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Denovo_transcriptome/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/Denovo_transcriptome/index.html">示范 WGCNA分析 结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/WGCNA_Traits_result/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/WGCNA_Traits_result/index.html</a></div>
<div></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/WGCNA_Traits_result/index.html">蛋白iTRAQ定量分析 项目结题报告</a></div>
<div><a href="http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/iTRAQ_Result/index.html">http://www.biotrainee.com/jmzeng/html_report/d/e/e/p/i/n/iTRAQ_Result/index.html</a></div>
<div></div>
</div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2372.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>gene symbol 中的奇怪开头基因</title>
		<link>http://www.bio-info-trainee.com/2129.html</link>
		<comments>http://www.bio-info-trainee.com/2129.html#comments</comments>
		<pubDate>Sun, 11 Dec 2016 00:48:20 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[entrez ID]]></category>
		<category><![CDATA[symbol]]></category>
		<category><![CDATA[基因对应]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2129</guid>
		<description><![CDATA[这本是我为论坛的基础板块写的一个基础知识点，但是浏览量实在有限，不忍它蒙尘，特在 &#8230; <a href="http://www.bio-info-trainee.com/2129.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这本是我为论坛的基础板块写的一个基础知识点，但是浏览量实在有限，不忍它蒙尘，特在博客重新发布一次！原帖见：<a href="http://www.biotrainee.com/thread-511-1-1.html" target="_blank">http://www.biotrainee.com/thread-511-1-1.html</a></p>
<p>gene symbol 是非常官方的，由HUGO 组织负责维护，有专门的数据库HGNC database of human gene names | HUGO<br />
以前分析数据的时候，有一些基因的symbol很奇怪，让我百思不得其解，比如<br />
C orf 系列基因，<br />
HS.系列基因，<br />
KRTAP系列基因，<br />
LOC系列基因，<br />
MIR系列基因，<br />
LINC系列基因<br />
它们往往一个系列，就有好几百个基因；<br />
C12orf44; Chromosome 12 Open Reading Frame 44;  这个是C orf系列基因的意思<br />
MIR系列基因应该是 miRNA相关的基因<br />
LINC系列基因应该就是long intergenic non-protein coding RNA<br />
LOC系列基因，是非正式的，推定的，日后可能被更合适的名字替代<br />
我这里做好了所有的基因对应关系，去生信菜鸟团QQ群里下载吧，共47938个基因的symbol和entrez gene id还有name，还有alias的对应!</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/12.png"><img class="alignnone size-full wp-image-2130" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/12.png" alt="1" width="535" height="450" /></a><br />
还有一些RNA基因，根本就没有symbol，比如：CTA/B/C/D系列的<br />
Aliases for ENSG00000271971 Gene<br />
Quality Score for this RNA gene is 1<br />
Aliases for ENSG00000271971 Gene<br />
CTD-2006H14.2 5<br />
External Ids for ENSG00000271971 Gene<br />
Ensembl: ENSG00000271971<br />
还有，如果你看到HS.开头的基因，它是unigene的ID了，已经不再是symbol啦。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2129.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用R获取芯片探针与基因的对应关系三部曲-NCBI下载对应关系</title>
		<link>http://www.bio-info-trainee.com/2126.html</link>
		<comments>http://www.bio-info-trainee.com/2126.html#comments</comments>
		<pubDate>Sun, 11 Dec 2016 00:34:42 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[对应关系]]></category>
		<category><![CDATA[芯片探针与基因]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2126</guid>
		<description><![CDATA[这是系列文章，请先看： 用R获取芯片探针与基因的对应关系三部曲-biocondu &#8230; <a href="http://www.bio-info-trainee.com/2126.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这是系列文章，请先看：</p>
<h2><a title="详细阅读 用R获取芯片探针与基因的对应关系三部曲-bioconductor" href="http://www.bio-info-trainee.com/1399.html" rel="bookmark">用R获取芯片探针与基因的对应关系三部曲-bioconductor</a></h2>
<p>ncbi现有的GPL已经过万了，但是bioconductor的芯片注释包不到一千，虽然bioconductor可以解决我们大部分的需要，比如affymetrix的95,133系列，深圳1.0st系列，HTA2.0系列，但是如果碰到比较生僻的芯片，bioconductor也不会刻意为之制作一个bioconductor的包，这时候就需要自行下载NCBI的GPL信息了，也可以通过R来解决：</p>
<p><strong><span style="color: #ff0000;">##本质上是下载一个文件，读进R里面，然后解析行列式，得到芯片探针与基因的对应关系，看下面的代码，你就能理解了。</span></strong><span id="more-2126"></span></p>
<p>## A-AGIL-28 - Agilent Whole Human Genome Microarray 4x44K 014850 G4112F (85 cols x 532 rows)<br />
library(Biobase)<br />
library(GEOquery)<br />
#Download GPL file, put it in the current directory, and load it:<br />
gpl &lt;- getGEO('GPL6480', destdir=".")<br />
colnames(Table(gpl)) ## [1] 41108 17<br />
head(Table(gpl)[,c(1,6,7)]) ## you need to check this , which column do you need<br />
write.csv(Table(gpl)[,c(1,6,7)],"GPL6400.csv")<br />
#platformDB='hgu133plus2.db'<br />
#library(platformDB, character.only=TRUE)<br />
probeset &lt;- featureNames(GSE32575[[1]])<br />
library(Biobase)<br />
library(GEOquery)<br />
#Download GPL file, put it in the current directory, and load it:<br />
gpl &lt;- getGEO('GPL6102', destdir=".")<br />
colnames(Table(gpl)) ## [1] 41108 17<br />
head(Table(gpl)[,c(1,10,13)]) ## you need to check this , which column do you need<br />
probe2symbol=Table(gpl)[,c(1,13)]<br />
## GPL15207 [PrimeView] Affymetrix Human Gene Expression Array<br />
probeset &lt;- featureNames(GSE58979[[1]])<br />
library(Biobase)<br />
library(GEOquery)<br />
#Download GPL file, put it in the current directory, and load it:<br />
gpl &lt;- getGEO('GPL15207', destdir=".")<br />
colnames(Table(gpl)) ## [1] 49395 24<br />
head(Table(gpl)[,c(1,15,19)]) ## you need to check this , which column do you need<br />
probe2symbol=Table(gpl)[,c(1,15)]</p>
<p>## GPL10558 Illumina HumanHT-12 V4.0 expression beadchip<br />
library(Biobase)<br />
library(GEOquery)<br />
#Download GPL file, put it in the current directory, and load it:<br />
gpl &lt;- getGEO('GPL10558', destdir=".")<br />
colnames(Table(gpl)) ## [1] 41108 17<br />
head(Table(gpl)[,c(1,10,13)]) ## you need to check this , which column do you need<br />
probe2symbol=Table(gpl)[,c(1,13)]</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2126.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>java版本GSEA软件的ES score图片的修改</title>
		<link>http://www.bio-info-trainee.com/2105.html</link>
		<comments>http://www.bio-info-trainee.com/2105.html#comments</comments>
		<pubDate>Thu, 01 Dec 2016 16:53:10 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[ES score]]></category>
		<category><![CDATA[GSEA]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2105</guid>
		<description><![CDATA[首先要明白这个ES score图片里面的数据是什么，这样才能修改它，因为java &#8230; <a href="http://www.bio-info-trainee.com/2105.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>首先要明白这个ES score图片里面的数据是什么，这样才能修改它，因为java是一个封闭打包好的软件，所以我们没办法在里面修改它没有提供的参数，运行完GSEA，默认输出的图就是下面这样：<span id="more-2105"></span></p>
<div style="width: 513px" class="wp-caption alignnone"><img class="" src="http://note.youdao.com/yws/api/group/23785548/noteresource/9ED49F972A0F4980AE784E76A7DFFC29/version/256?method=get-resource&amp;shareToken=DBDB0277A315444BBBAB2024190208AE&amp;entryId=123732909" alt="" width="503" height="504" /><p class="wp-caption-text">ES score</p></div>
<p>这个图片在发表的时候，就会发现其实蛮模糊的， 所以有可能需要自己重新制作这个图，那么就需要明白这个图后面的数据。</p>
<p>其中最下面的数据是量方法测到了2万个基因，那么这两万个基因在case和control组的差异度量(六种差异度量，默认是signal 2 noise，GSEA官网有提供公式，也可以选择大家熟悉的foldchange)肯定不一样,那么根据它们的差异度量，就可以对它们进行排序，并且Z-score标准化的结果。</p>
<p>而中间的就是该gene set在测到了的已经根据signal2noise排好序的2万个基因的位置。</p>
<p>最上面的图，就是所有的基因的ES score都要一个个加起来，叫做running  ES score，在加的过程中，什么时候ES score达到了最大值，就是这个gene set最终的ES score！</p>
<p>我这里全面解析了GSEA官网提供的R代码的绘图函数，如下：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/ES-SCORE图的画法.png"><img class="alignnone size-full wp-image-2106" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/ES-SCORE图的画法.png" alt="es-score%e5%9b%be%e7%9a%84%e7%94%bb%e6%b3%95" width="1574" height="650" /></a></p>
<p>这个函数本身也被我抽离出来了：</p>
<p>这个知识点有点复杂，<strong><span style="color: #ff0000;">我解释的很清楚数据是什么，但是数据如何来的（就是下面代码读取的txt文件）</span></strong>，我没办法用博客写清楚，需要修改一个2500行的源代码才能获取数据！</p>
<blockquote><p>setwd('data')<br />
<strong><span style="color: #ff0000;">Obs.RES=read.table('Obs.RES.txt') </span></strong><br />
<strong><span style="color: #ff0000;">Obs.RES=t(Obs.RES) ## 每个基因在每个gene set里面的running ES score，一个矩阵</span></strong><br />
<strong><span style="color: #ff0000;">Obs.indicator=read.table('Obs.indicator.txt') </span></strong><br />
<strong><span style="color: #ff0000;">Obs.indicator=t(Obs.indicator) ## 每个基因是否属于每个gene set，一个0/1矩阵</span></strong><br />
<strong><span style="color: #ff0000;">obs.s2n=read.table('obs.s2n.txt')[,1]  ## 每个基因的signal 2 noise值，已经Z-score化，而且排好序了。</span></strong><br />
<strong><span style="color: #ff0000;">size.G=read.table('size.G.txt')[,1]  ## 每个gene set的基因数量，在图中需要显示</span></strong><br />
<strong><span style="color: #ff0000;">gs.names=read.table('gs.names.txt')[,1] ## 每个gene set的名字，在图中需要显示</span></strong><br />
<strong><span style="color: #ff0000;">Obs.arg.ES=read.table('Obs.arg.ES.txt')[,1]## 每个gene set的最大ES score出现在排序基因的位置</span></strong><br />
<strong><span style="color: #ff0000;">Obs.ES.index=read.table('Obs.ES.index.txt')[,1]## 这个用不着的，我也忘记是什么了</span></strong><br />
<strong><span style="color: #ff0000;">Obs.ES=read.table('Obs.ES.txt')[,1]  ##每个gene set的最大ES score是多少，如果是正值，用红色表示富集在case组，如果是负值，用蓝色，表示富集在control组。</span></strong></p>
<p>plot_ES_score &lt;- function(Ng=12,N=34688,phen1='control',phen2='case',Obs.RES,Obs.indicator,obs.s2n,size.G,gs.names,Obs.arg.ES,Obs.ES.index){<br />
for (i in 1:Ng) {<br />
png(paste0('number_',gs.names[i],'.png'))<br />
ind &lt;- 1:N<br />
min.RES &lt;- min(Obs.RES[i,])<br />
max.RES &lt;- max(Obs.RES[i,])<br />
if (max.RES &lt; 0.3) max.RES &lt;- 0.3<br />
if (min.RES &gt; -0.3) min.RES &lt;- -0.3<br />
delta &lt;- (max.RES - min.RES)*0.50<br />
min.plot &lt;- min.RES - 2*delta<br />
max.plot &lt;- max.RES<br />
max.corr &lt;- max(obs.s2n)<br />
min.corr &lt;- min(obs.s2n)<br />
Obs.correl.vector.norm &lt;- (obs.s2n - min.corr)/(max.corr - min.corr)*1.25*delta + min.plot<br />
zero.corr.line &lt;- (- min.corr/(max.corr - min.corr))*1.25*delta + min.plot<br />
col &lt;- ifelse(Obs.ES[i] &gt; 0, 2, 4)</p>
<p># Running enrichment plot</p>
<p>sub.string &lt;- paste("Number of genes: ", N, " (in list), ", size.G[i], " (in gene set)", sep = "", collapse="")</p>
<p>main.string &lt;- paste("Gene Set ", i, ":", gs.names[i])</p>
<p>plot(ind, Obs.RES[i,], main = main.string, sub = sub.string, xlab = "Gene List Index", ylab = "Running Enrichment Score (RES)", xlim=c(1, N), ylim=c(min.plot, max.plot), type = "l", lwd = 2, cex = 1, col = col)<br />
for (j in seq(1, N, 20)) {<br />
lines(c(j, j), c(zero.corr.line, Obs.correl.vector.norm[j]), lwd = 1, cex = 1, col = colors()[12]) # shading of correlation plot<br />
}<br />
lines(c(1, N), c(0, 0), lwd = 1, lty = 2, cex = 1, col = 1) # zero RES line<br />
lines(c(Obs.arg.ES[i], Obs.arg.ES[i]), c(min.plot, max.plot), lwd = 1, lty = 3, cex = 1, col = col) # max enrichment vertical line<br />
for (j in 1:N) {<br />
if (Obs.indicator[i, j] == 1) {<br />
lines(c(j, j), c(min.plot + 1.25*delta, min.plot + 1.75*delta), lwd = 1, lty = 1, cex = 1, col = 1) # enrichment tags<br />
}<br />
}<br />
lines(ind, Obs.correl.vector.norm, type = "l", lwd = 1, cex = 1, col = 1)<br />
lines(c(1, N), c(zero.corr.line, zero.corr.line), lwd = 1, lty = 1, cex = 1, col = 1) # zero correlation horizontal line<br />
temp &lt;- order(abs(obs.s2n), decreasing=T)<br />
arg.correl &lt;- temp[N]<br />
lines(c(arg.correl, arg.correl), c(min.plot, max.plot), lwd = 1, lty = 3, cex = 1, col = 3) # zero crossing correlation vertical line</p>
<p>leg.txt &lt;- paste("\"", phen1, "\" ", sep="", collapse="")<br />
text(x=1, y=min.plot, adj = c(0, 0), labels=leg.txt, cex = 1.0)</p>
<p>leg.txt &lt;- paste("\"", phen2, "\" ", sep="", collapse="")<br />
text(x=N, y=min.plot, adj = c(1, 0), labels=leg.txt, cex = 1.0)</p>
<p>adjx &lt;- ifelse(Obs.ES[i] &gt; 0, 0, 1)</p>
<p>leg.txt &lt;- paste("Peak at ", Obs.arg.ES[i], sep="", collapse="")<br />
text(x=Obs.arg.ES[i], y=min.plot + 1.8*delta, adj = c(adjx, 0), labels=leg.txt, cex = 1.0)</p>
<p>leg.txt &lt;- paste("Zero crossing at ", arg.correl, sep="", collapse="")<br />
text(x=arg.correl, y=min.plot + 1.95*delta, adj = c(adjx, 0), labels=leg.txt, cex = 1.0)<br />
dev.off()<br />
}</p>
<p>}</p>
<p>&nbsp;</p></blockquote>
<p>通过这个代码，就可以把当前所有gese set的 ES score图给重新画一下，如果需要调整字体大小，就去代码里面慢慢调整。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2105.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>如何安装别人开发的未发表的包</title>
		<link>http://www.bio-info-trainee.com/2092.html</link>
		<comments>http://www.bio-info-trainee.com/2092.html#comments</comments>
		<pubDate>Tue, 29 Nov 2016 23:49:52 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[devtools]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[roxygen]]></category>
		<category><![CDATA[R包]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2092</guid>
		<description><![CDATA[我以为我写完了R包终极解决方案！ 之后，应该不会再有任何关于R包安装的问题产生了 &#8230; <a href="http://www.bio-info-trainee.com/2092.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我以为我写完了<a href="http://www.biotrainee.com/thread-144-1-1.html">R包终极解决方案！</a> 之后，应该不会再有任何关于R包安装的问题产生了，但仔细回过头来看才发现，我介绍的都是如何从CRAN或者bioconductor里面安装正规发布的包，但是有很多人开发的是自己私人的包，而我们有的确非常需要用怎么办？？这个时候就需要下载别人开发的包来安装了。比如我R包地址见github：<a href="https://github.com/jmzeng1314/humanid" target="_blank">https://github.com/jmzeng1314/humanid</a> <span id="more-2092"></span></p>
<p>首先你必须确定这个包是干净的，没有危险，然后要确定你的确需要这个包，因为大多数是时候你其实只需要他包里面一个函数即可。如果确定需要安装，就安装一个git软件吧，然后git clone https://github.com/jmzeng1314/humanid.git  这样就把这个R包下载到了自己指定的目录，或者如果你懒得安装git软件，直接在github网页里面下载成zip格式的压缩包也行。</p>
<p>下载的R包里面有一个.Rproj后缀的文件，可以自己双击打开，在Rstudio里面就可以点击build安装这个R包了，如图：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/R-package-install.png"><img class="alignnone size-full wp-image-2094" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/R-package-install.png" alt="r-package-install" width="541" height="227" /></a></p>
<p>安装完毕后就会自动加载这个包，然后就可以看到它里面的各种函数和数据的！你已经成功的接收了别人的代码啦！</p>
<p><img class="alignnone size-full wp-image-2093" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/R-package-help.png" alt="r-package-help" width="898" height="380" /></p>
<p>&nbsp;</p>
<p>那么是不是安装好了这个包，你就可以使用它了呢？其实不然，如果是已经发表的正规的包，一般会写好完全的依赖关系，所以在你安装过程中，会提示你不停地安装各种包，但是我的包没有，只有在你运行我但是的时候，我才会报错，告诉你你需要安装某某包！的确有点傻，因为我懒得去写依赖关系，或者说，我还没有学到！</p>
<p>&gt; keggAnno()<br />
Show Traceback</p>
<p>Rerun with Debug<br />
Error in loadNamespace(name) : <strong><span style="color: #ff0000;">there is no package called ‘DT’</span> </strong><br />
&gt;</p>
<p>这样做其实也有好处，你无论如何都是可以把我的包安装上的，虽然你可能安装上了也无法使用。</p>
<p>&gt; install.packages('DT')<br />
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.3/DT_0.2.zip'<br />
Content type 'application/zip' length 950203 bytes (927 KB)<br />
downloaded 927 KB</p>
<p>package ‘DT’ successfully unpacked and MD5 sums checked</p>
<p>The downloaded binary packages are in<br />
C:\Users\jimmy\AppData\Local\Temp\RtmpoT4VWl\downloaded_packages<br />
&gt;</p>
<p>很容易安装好了DT这个包，我的函数就可以使用啦！keggAnno()这个函数默认运行成功是没有提示的，但是你可以查看你当前目录，的确多了一个kegg注释文件，可以把你感兴趣的基因批量注释到KEGG数据库。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2092.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>如何开发自己的R包</title>
		<link>http://www.bio-info-trainee.com/2089.html</link>
		<comments>http://www.bio-info-trainee.com/2089.html#comments</comments>
		<pubDate>Tue, 29 Nov 2016 23:40:30 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[devtools]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[roxygen]]></category>
		<category><![CDATA[R包]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2089</guid>
		<description><![CDATA[随着R语言的流行度的提高，开发一个R包已经不再是专业程序猿才有的技能了。我这里讲 &#8230; <a href="http://www.bio-info-trainee.com/2089.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>随着R语言的流行度的提高，开发一个R包已经不再是专业程序猿才有的技能了。我这里讲的不是如何写一个包含了复杂统计公式或者发表一篇SCI文章的包，而是简简单单的用Rstudio自带的创建包的功能把自己的几个函数和数据打包！！！我R包地址见github：<a href="https://github.com/jmzeng1314/humanid" target="_blank">https://github.com/jmzeng1314/humanid</a><span id="more-2089"></span></p>
<p>起初，我也是搜索了一下资料的，资料如下：</p>
<div><em><a href="https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio">https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio</a> </em></div>
<div><em>使用Rstudio工具，只需要鼠标点击记下就可以创建自己的R包了！</em></div>
<div><em>首先需要自行读完下面4个教程</em></div>
<ul>
<li><em><a href="http://cran.r-project.org/doc/manuals/R-exts.html">Writing R Extensions</a></em></li>
<li><em><a href="http://r-pkgs.had.co.nz/">R Packages (Hadley Wickham)</a></em></li>
<li><em><a href="http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf">Creating R Packages: A Tutorial (Friedrich Leisch)</a></em></li>
<li><em><a href="http://portal.stats.ox.ac.uk/userdata/ruth/APTS2012/Rcourse10.pdf">Making an R Package (R.M. Ripley)</a></em></li>
</ul>
<div><em>重点并不在如何创建包，而是在如何写包里面的函数，readme，还有data，如果你觉得上面的文档看得有点枯燥，还可以去YouTube里面看看视频，十几分钟就可以说明白如何创建R包：</em></div>
<div><em><a href="http://stat545.com/packages05_foofactors-package-02.html">http://stat545.com/packages05_foofactors-package-02.html</a></em></div>
<div></div>
<div><em>R包最好是跟自己的github账号同步，首先<a href="https://www.r-bloggers.com/rstudio-and-github/">https://www.r-bloggers.com/rstudio-and-github/</a> </em></div>
<p>&nbsp;</p>
<p>如果你不想看上面我看过的教程，那么就看我写的吧！</p>
<p><span style="color: #ff0000;">首先安装devtools和roxygen这两个辅助开发R包的包，然后在Rstudio的File菜单下面有一个new project继续选择new directory，选择R package，然后就可以啦！！！这时候你已经成功了开发了一个自己的包</span>，里面也自带了一个函数。当然，这样的包没有任何意义，只是为了让你明白什么是开发一个R包，然后你可以添加自己的函数和数据，为了方便理解，对每个函数还需要写详细的帮助文档。<span style="color: #ff0000;">如果的包里面调用了其它公共包，还需要写清楚依赖关系</span>。如下图：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/R-package-create.png"><img class="alignnone size-full wp-image-2090" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/R-package-create.png" alt="r-package-create" width="948" height="518" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2089.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R语言画网络图三部曲之igraph</title>
		<link>http://www.bio-info-trainee.com/2082.html</link>
		<comments>http://www.bio-info-trainee.com/2082.html#comments</comments>
		<pubDate>Mon, 28 Nov 2016 10:08:59 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[网络分析]]></category>
		<category><![CDATA[网络图]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2082</guid>
		<description><![CDATA[经过热心的小伙伴的提醒，我才知道我以前写的R语言画网络图三部曲竟然漏掉了最基础的 &#8230; <a href="http://www.bio-info-trainee.com/2082.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>经过热心的小伙伴的提醒，我才知道我以前写的R语言画网络图三部曲竟然漏掉了最基础的一个包，就是igraph，不了解这个，后面的两个也是无源之水。</p>
<h2><a title="详细阅读 R语言画网络图三部曲之networkD3" href="http://www.bio-info-trainee.com/1357.html" rel="bookmark">R语言画网络图三部曲之networkD3</a></h2>
<h2><a title="详细阅读 R语言画网络图三部曲之sna" href="http://www.bio-info-trainee.com/1355.html" rel="bookmark">R语言画网络图三部曲之sna</a></h2>
<div>包说明书：<a href="https://cran.r-project.org/web/packages/igraph/igraph.pdf">https://cran.r-project.org/web/packages/igraph/igraph.pdf</a></div>
<div>包例子：<a href="https://www.r-project.org/conferences/useR-2008/slides/Csardi.pdf">https://www.r-project.org/conferences/useR-2008/slides/Csardi.pdf</a></div>
<div>包函数：<a href="http://igraph.org/r/doc/">http://igraph.org/r/doc/</a></div>
<div>PPI实例：<a href="http://a-little-book-of-r-for-bioinformatics.readthedocs.io/en/latest/src/chapter11.html">http://a-little-book-of-r-for-bioinformatics.readthedocs.io/en/latest/src/chapter11.html</a></div>
<div>其实包括了3个包：igraph/RBGL/Rgraphviz</div>
<div>用到了一个测试数据，是构建好的PPI网络对象：We will first analyse a curated data set of protein-protein interactions in the yeast Saccharomyces cerevisiae extracted from published papers. This data set comes from with an R package called “yeastExpData”, which calls the data set “litG”. This data was first described in a paper by Ge et al (2001) in Nature Genetics (<a href="http://www.nature.com/ng/journal/v29/n4/full/ng776.html">http://www.nature.com/ng/journal/v29/n4/full/ng776.html</a>).</div>
<div></div>
<p><span id="more-2082"></span></p>
<div><span style="color: #ff0000;">重点是graphNEL graph对象如何构造以及如何用函数来处理它！</span></div>
<div><span style="color: #ff0000;">构造方式，请记住，构造网络对象是重点，就是graph.data.frame+as_graphnel即可，一系列以网络对象为基础的包都需要这个步骤，学会了，也就没有问题了！</span></div>
<div>读取PPI数据到data.frame里面，比如my_edges</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/218613774f924593bf4f8c0c9409d9ec/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="A5A10EECE0EE4050BB621ECF2A264D50" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/clipboard1.png"><img class="alignnone size-full wp-image-2083" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/clipboard1.png" alt="clipboard" width="359" height="124" /></a></div>
<div>tmp &lt;- graph.data.frame(my_edges)</div>
<div>tmp;summary(tmp)</div>
<div>plot(tmp, layout=layout.kamada.kawai)</div>
<div>subnet &lt;- as_graphnel(tmp)</div>
<div>这个时候得到的subnet就是一个网络对象啦！</div>
<div>&gt; subnet</div>
<div>A graphNEL graph with directed edges</div>
<div>Number of Nodes = 818</div>
<div>Number of Edges = 12249</div>
<div>有了这个网络对象，就可以用BioNet来处理找<a href="http://www.bio-info-trainee.com/2071.html">maximal-scoring subgraph</a></div>
<div></div>
<div>对于网络对象，其它处理的函数有</div>
<div>mynodes &lt;- nodes(litG) 得到网络里面的所有节点信息</div>
<div>adj(litG, "YBR009C") 得到网络里面的YBR009C这个node节点的所有edges</div>
<div>mydegrees &lt;- graph::degree(litG) 算出网络里面的每个node的degree</div>
<div>table(mydegrees);mean(mydegrees);hist(mydegrees, col="red") 看看degree的分布情况。</div>
<div>对比较大的网络来说，并非里面的node都是连通的，可以用RBGL包来看看哪些nodes被隔离开了。</div>
<div>library("RBGL") myconnectedcomponents &lt;- connectedComp(litG)</div>
<div>返回的myconnectedcomponents这个list的每个元素都是一个被隔离开的网络图，可以去找最大连通图，也可以对这个list找到特定的某个node参与的连通图。</div>
<div>component3 &lt;- myconnectedcomponents[[3]]</div>
<div>mysubgraph &lt;- subGraph(component3, litG) 取指定的连通图，生成graphNEL对象，其实就是根据nodes来取子网络图。</div>
<div>下面代码可以把网络图展现出来：</div>
<div>library("Rgraphviz") mysubgraph &lt;- subGraph(component3, litG) mygraphplot &lt;- layoutGraph(mysubgraph, layoutType="neato") renderGraph(mygraphplot)</div>
<div></div>
<div>对网络图还可以找communities，这个又是一个网络图研究术语了： <a href="http://en.wikipedia.org/wiki/Community_structure">http://en.wikipedia.org/wiki/Community_structure</a></div>
<div>还可以进行聚类，就是cluster，还有很多，我就不一一介绍了。</div>
<div>上面的连通图也是一个网络研究术语：<a href="http://en.wikipedia.org/wiki/Connected_component_(graph_theory)">http://en.wikipedia.org/wiki/Connected_component_(graph_theory)</a></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2082.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用BioNet这个bioconductor包来找 maximal-scoring subgraph</title>
		<link>http://www.bio-info-trainee.com/2071.html</link>
		<comments>http://www.bio-info-trainee.com/2071.html#comments</comments>
		<pubDate>Fri, 25 Nov 2016 14:54:20 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[BioNet]]></category>
		<category><![CDATA[网络分析]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2071</guid>
		<description><![CDATA[## 此包是为了解决一个难题： maximal-scoring subgraph &#8230; <a href="http://www.bio-info-trainee.com/2071.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>## 此包是为了解决一个难题： maximal-scoring subgraph (MSS) problem ，在一个巨大的复杂网络里面找到significantly differentially expressed subnetworks，就是说，得到了几百个差异基因，去PPI数据库做网络图的时候，发现还是巨大无比，所以需要用这个包来精简我们的网络图。</div>
<div>heuristically的中文意思：启发性地</div>
<div>## 而这个R包可以整合多种数据结果来给一个网络打分，</div>
<div>包的主页是：<a href="https://www.bioconductor.org/packages/release/bioc/html/BioNet.html">https://www.bioconductor.org/packages/release/bioc/html/BioNet.html</a></div>
<div>paper：<a href="http://bioinformatics.oxfordjournals.org/content/early/2010/02/25/bioinformatics.btq089">BioNet: an R-Package for the Functional Analysis of ... - Bioinformatics</a></div>
<div>它整合了PPI网络分析和寻找功能模块的需求。</div>
<div>脚本：<a href="https://www.bioconductor.org/packages/release/bioc/vignettes/BioNet/inst/doc/Tutorial.R">https://www.bioconductor.org/packages/release/bioc/vignettes/BioNet/inst/doc/Tutorial.R</a></div>
<div>教程：<a href="https://www.bioconductor.org/packages/release/bioc/vignettes/BioNet/inst/doc/Tutorial.pdf">https://www.bioconductor.org/packages/release/bioc/vignettes/BioNet/inst/doc/Tutorial.pdf</a></div>
<div>重点就是根据一个"igraph" or "graphNEL"对象和打分来找最大的MSS</div>
<div>subnet &lt;- subNetwork(dataLym$label, interactome)</div>
<div>module &lt;- runFastHeinz(subnet, scores)</div>
<div>plotModule(module, scores=scores, diff.expr=logFC) #这个就是精简后的我们的网络图。</div>
<div>其实另外一个函数也有类似的功能，dNetFind <a href="https://rdrr.io/cran/dnet/man/dNetFind.html">https://rdrr.io/cran/dnet/man/dNetFind.html</a></div>
<div></div>
<p><span id="more-2071"></span></p>
<div>## 里面用到的网络，都是基于igraph的包： A graph object, either in graphNEL or igraph format.</div>
<div>## 首先加载一系列的包和内置数据</div>
<div></div>
<div>library(BioNet)</div>
<div>library(DLBCL)</div>
<div>data(dataLym)</div>
<div>data(interactome)</div>
<div>## dataLym 里面是3个样本,t,s,o 分别对应着的每个基因的p值</div>
<div>## interactome是一个内置的PPI网络对象，可以根据指定的基因list来提取里面的信息</div>
<div></div>
<div>pvals &lt;- cbind(t=dataLym$t.pval, s=dataLym$s.pval)</div>
<div>rownames(pvals) &lt;- dataLym$label</div>
<div>pval &lt;- aggrPvals(pvals, order=2, plot=FALSE)</div>
<div></div>
<div>## 提取t,s样本的p值，然后用aggrPvals整合成一个p值</div>
<div></div>
<div>subnet &lt;- subNetwork(dataLym$label, interactome)</div>
<div>subnet &lt;- rmSelfLoops(subnet)</div>
<div>subnet</div>
<div>## 根据指定的dataLym$label基因信息来提取网络，但是这个基因信息有点奇怪,比如TP53(7157) ， 看起来是symbol跟entrez ID的合体。</div>
<div>## 函数rmSelfLoops是标配，只要是网络，都需要处理一下，去除自循环信息</div>
<div>## 因为指定的dataLym$label基因是有限的，一般不会太多，提取的网络一般也就上千个nodes，万把个edges的</div>
<div></div>
<div>fb &lt;- fitBumModel(pval, plot=FALSE)</div>
<div>## 对我们整合好的基因对应的P值进行Beta-Uniform-Mixture (BUM) model模型处理。</div>
<div>scores &lt;- scoreNodes(subnet, fb, fdr=0.001)</div>
<div></div>
<div>module &lt;- runFastHeinz(subnet, scores)</div>
<div>## Here we use a fast heuristic approach to calculate an approximation to the optimal scoring subnetwork.</div>
<div>logFC &lt;- dataLym$diff</div>
<div>names(logFC) &lt;- dataLym$label</div>
<div></div>
<div>plotModule(module, scores=scores, diff.expr=logFC)</div>
<div>## diff.expr是用来给nodes调色的</div>
<div>## scores是用来给nodes赋予性状的</div>
<div>## 这个函数本身是基于graphNEL or igraph format的定制版，其实可以直接用igraph包来绘图。</div>
<div>## 也可以把这个network导出成Cytoscape format，这样可以用cytoscape来绘图</div>
<div>## 一般来说，红色是上调基因，绿色是下调基因，圆形是得分为正，菱形是得分为负</div>
<div></div>
<div></div>
<div>## 下面是一个实际的例子，如何使用BioNet包来做网络分析</div>
<div>library(BioNet)</div>
<div>library(DLBCL)</div>
<div>data(exprLym)</div>
<div>data(interactome)</div>
<div>exprLym ## 内置对象，所以它的gene的laber是符合interactome的要求的</div>
<div>interactome</div>
<div>network &lt;- subNetwork(featureNames(exprLym), interactome)</div>
<div>network</div>
<div>network &lt;- largestComp(network)</div>
<div>## The function extracts the largest component of a network</div>
<div>network</div>
<div></div>
<div>library(genefilter)</div>
<div>library(impute)</div>
<div>expressions &lt;- impute.knn(exprs(exprLym))$data</div>
<div>## exprs得到的不再是纯粹的表达矩阵，需要用来 impute missing expression data</div>
<div>## 这里选择genefilter包的rowttests函数来做差异分析</div>
<div>t.test &lt;- rowttests(expressions, fac=exprLym$Subgroup)</div>
<div>t.test[1:10, ]</div>
<div>data(dataLym)</div>
<div></div>
<div>ttest.pval &lt;- t.test[, "p.value"]</div>
<div>surv.pval &lt;- dataLym$s.pval</div>
<div>names(surv.pval) &lt;- dataLym$label</div>
<div>pvals &lt;- cbind(ttest.pval, surv.pval)</div>
<div>pval &lt;- aggrPvals(pvals, order=2, plot=FALSE)</div>
<div>fb &lt;- fitBumModel(pval, plot=FALSE)</div>
<div>fb</div>
<div>## 用图来展示这个fitBumModel函数到底做了什么</div>
<div>dev.new(width=13, height=7)</div>
<div>par(mfrow=c(1,2))</div>
<div>hist(fb)</div>
<div>plot(fb)</div>
<div>dev.off()</div>
<div></div>
<div>## 下面这个图可以看到 Beta-Uniform-Mixture (BUM) 模型的两个参数是如何体现的</div>
<div>plotLLSurface(pval, fb)</div>
<div></div>
<div>scores &lt;- scoreNodes(network=network, fb=fb, fdr=0.001)</div>
<div>## 根据p值来对每个edge打分</div>
<div></div>
<div>network &lt;- rmSelfLoops(network)</div>
<div></div>
<div>## 下面是把网络数据写到txt文档，就可以导入到cytoscape啦！</div>
<div>writeHeinzEdges(network=network, file="lymphoma_edges_001", use.score=FALSE)</div>
<div>writeHeinzNodes(network=network, file="lymphoma_nodes_001", node.scores = scores)</div>
<div></div>
<div>datadir &lt;- file.path(path.package("BioNet"), "extdata")</div>
<div>dir(datadir)</div>
<div>## 本次算法变了：the heinz algorithm is used to calculate the maximum-scoring subnetwork</div>
<div>## 下面的文件需要借助heinz.py脚本生成，这里实例用的是包自带的数据</div>
<div>## 脚本代码是：heinz.py -e lymphoma_edges_001.txt -n lymphoma_nodes_001.txt -N True -E False</div>
<div></div>
<div>module &lt;- readHeinzGraph(node.file=file.path(datadir, "lymphoma_nodes_001.txt.0.hnz"), network=network)</div>
<div>diff &lt;- t.test[, "dm"]</div>
<div>names(diff) &lt;- rownames(t.test)</div>
<div></div>
<div>plotModule(module, diff.expr=diff, scores=scores)</div>
<div></div>
<div>sum(scores[nodes(module)])</div>
<div>sum(scores[nodes(module)]&gt;0)</div>
<div>sum(scores[nodes(module)]&lt;0)</div>
<div></div>
<div></div>
<div>###################################################</div>
<div>### code chunk number 27: Tutorial.Rnw:375-380</div>
<div>###################################################</div>
<div>library(BioNet)</div>
<div>library(DLBCL)</div>
<div>library(ALL)</div>
<div>data(ALL)</div>
<div>data(interactome)</div>
<div>## 这个ALL是另外一个包的数据，基因ID现在还没有，是探针ID，需要转换成BioNet识别的！</div>
<div>mapped.eset &lt;- mapByVar(ALL, network=interactome, attr="geneID")</div>
<div>mapped.eset[1:5,1:5]</div>
<div>length(intersect(rownames(mapped.eset), nodes(interactome)))</div>
<div>network &lt;- subNetwork(rownames(mapped.eset), interactome)</div>
<div>network</div>
<div>network &lt;- largestComp(network)</div>
<div>network &lt;- rmSelfLoops(network)</div>
<div>network</div>
<div></div>
<div>## 这里用limma来做差异分析</div>
<div>library(limma)</div>
<div>design &lt;- model.matrix(~ -1+ factor(c(substr(unlist(ALL$BT), 0, 1))))</div>
<div>colnames(design)&lt;- c("B", "T")</div>
<div>contrast.matrix &lt;- makeContrasts(B-T, levels=design)</div>
<div>contrast.matrix</div>
<div>fit &lt;- lmFit(mapped.eset, design)</div>
<div>fit2 &lt;- contrasts.fit(fit, contrast.matrix)</div>
<div>fit2 &lt;- eBayes(fit2)</div>
<div>pval &lt;- fit2$p.value[,1]</div>
<div>fb &lt;- fitBumModel(pval, plot=FALSE)</div>
<div>fb</div>
<div>dev.new(width=13, height=7)</div>
<div>par(mfrow=c(1,2))</div>
<div>hist(fb)</div>
<div>plot(fb)</div>
<div>scores &lt;- scoreNodes(network=network, fb=fb, fdr=1e-14)</div>
<div>## 还是把网络数据写到本地，供cytoscape导入</div>
<div>writeHeinzEdges(network=network, file="ALL_edges_001", use.score=FALSE)</div>
<div>writeHeinzNodes(network=network, file="ALL_nodes_001", node.scores = scores)</div>
<div>## 还是使用 heinz algorithm is used to calculate the maximum-scoring subnetwork</div>
<div>## A new implementation Heinz v2.0 is also available at https://software.cwi.nl/software/heinz ,</div>
<div></div>
<div>datadir &lt;- file.path(path.package("BioNet"), "extdata")</div>
<div>module &lt;- readHeinzGraph(node.file=file.path(datadir, "ALL_nodes_001.txt.0.hnz"), network=network)</div>
<div></div>
<div>nodeDataDefaults(module, attr="diff") &lt;- ""</div>
<div>nodeData(module, n=nodes(module), attr="diff") &lt;- fit2$coefficients[nodes(module),1]</div>
<div>nodeDataDefaults(module, attr="score") &lt;- ""</div>
<div>nodeData(module, n=nodes(module), attr="score") &lt;- scores[nodes(module)]</div>
<div>nodeData(module)[1]</div>
<div></div>
<div>## 保存为XGMML file，供cytoscape使用</div>
<div>saveNetwork(module, file="ALL_module", type="XGMML")</div>
<div></div>
<div><span style="color: #ff0000;">## 一般来说，红色是上调基因，绿色是下调基因，圆形是得分为正，菱形是得分为负</span></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2071.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>mysql的table居然有最大列限制</title>
		<link>http://www.bio-info-trainee.com/1988.html</link>
		<comments>http://www.bio-info-trainee.com/1988.html#comments</comments>
		<pubDate>Mon, 07 Nov 2016 12:19:33 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[癌种]]></category>
		<category><![CDATA[网页]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1988</guid>
		<description><![CDATA[想着把TCGA的RPKM值矩阵表格写入到mysql，然后做一个查询网页给生物学家 &#8230; <a href="http://www.bio-info-trainee.com/1988.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>想着把TCGA的RPKM值矩阵表格写入到mysql，然后做一个查询网页给生物学家，我下载的是<a href="http://www.bio-info-trainee.com/1274.html">所有TCGA收集的mRNA表达数据集数据集-GSE62944</a> ，共9264个癌症样本，和741个正常组织的表达数据。当我想写入癌症表达矩阵的时候，报错了：</div>
<div>Error in .local(conn, statement, ...) :</div>
<div>could not run statement: Too many columns</div>
<div>简单搜索了一下，发现是mysql有最大列的限制，但是我不是很懂计算机，所以没太看明白该如何调整参数使得mysql列限制扩充：<a href="http://dev.mysql.com/doc/refman/5.7/en/column-count-limit.html">http://dev.mysql.com/doc/refman/5.7/en/column-count-limit.html</a> 所以就把癌症表达矩阵根据癌症拆分了，癌种数量如下：</div>
<p><span id="more-1988"></span></p>
<div>table(tumorCancerType2amples$CancerType)</div>
<div></div>
<div>ACC BLCA BRCA CESC COAD DLBC GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD LUSC OV PRAD READ SKCM STAD THCA UCEC UCS</div>
<div>79 414 1119 306 483 48 170 504 66 542 291 178 532 374 541 502 430 502 167 472 420 513 554 57</div>
<div></div>
<div>分开写入mysql，下面给出解决方案及代码：</div>
<div></div>
<blockquote>
<div>tumorRPKM=read.table('GSM1536837_06_01_15_TCGA_24.tumor_Rsubread_FPKM.txt.gz',sep = '\t',stringsAsFactors = F,header = T)</div>
<div>colnames(tumorRPKM)[1]='geneSymbol'</div>
<div>rownames(tumorRPKM)=tumorRPKM$geneSymbol</div>
<div>tumorRPKM=tumorRPKM[,-1]</div>
<div>tumorRPKM=round( as.matrix(tumorRPKM),3)</div>
<div>tumorRPKM=as.data.frame(tumorRPKM)</div>
<div>tumorRPKM$geneSymbol = rownames(tumorRPKM)</div>
<div>#load(file = 'tumorRPKM.rData')</div>
<div>tumorCancerType2amples=read.table('GSE62944_06_01_15_TCGA_24_CancerType_Samples.txt',sep = '\t',stringsAsFactors = F)</div>
<div>colnames(tumorCancerType2amples)=c('sampleID','CancerType')</div>
<div>lapply(unique((tumorCancerType2amples$CancerType)), function(x){</div>
<div>#x='PRAD';</div>
<div>sampleList=tumorCancerType2amples[tumorCancerType2amples$CancerType==x,1]</div>
<div>sampleList=gsub("-",".", sampleList)</div>
<div>tmpMatrix=tumorRPKM[,c('geneSymbol',sampleList)]</div>
<div>dbWriteTable(con, paste('tumor',x,'RPKM',sep='_'), tmpMatrix, append=F,row.names=F)</div>
<div></div>
<div>})</div>
</blockquote>
<div>dbWriteTable这个函数，需要加载RMySQL，而且还需要连接好mysql数据库，不然你根本就看不懂的！</div>
<div>写入数据库如下：</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/37a5e2c5545348da8dfb48c791b8be99/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="8BE4BAC459CA49CC97F3871348380CEC" /></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/tmp.png"><img class="alignnone size-full wp-image-1989" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/tmp.png" alt="tmp" width="418" height="559" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1988.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
