七 08

基因组标准注释文件-Gencode数据库

Posted on 2016年7月8日 by ulwvfje

Gencode数据库是ENCODE计划的衍生品，也是由大名鼎鼎的sanger研究所负责整理和维护，主要记录了基因组的功能注释，比如基因组每条染色体上面有哪些编码蛋白的基因，哪些假基因，哪些lncRNA的基因，它们坐标是什么，基因上面的外显子内含子坐标是什么，UTR区域坐标是什么？我以前通常是在EBI的ENSEMBL的FTP服务器下载，后来才发现了这个Gencode数据库，现在以这个为金标准啦！

Continue reading →

五 16

假基因资源中心

Posted on 2016年5月16日 by ulwvfje

假基因是原来的能翻译成蛋白的基因经过各种突变导致丧失功能的基因。

比如

PTEN-->PTENP1

KRAS-->KRASP1

NANOG-->NANOGP1

很好理解，一般来说看到结尾是P1,等字眼的都是假基因，现在共有一万多假基因，我一般以http://www.genenames.org/cgi-bin/statistics （人类基因命名委员会）为标准参考。

研究的时候可能需要更全面一点，所以我又谷歌了一下，发现了一个还算比较全面的收集。

就是 http://pseudogene.org/Human/ （中心网站）

现在主要是 ENCODE计划的GENCODE 21. 和耶鲁大学的Ensembl genome release 79.

Human Pseudogene Annotation

GENCODE Annotation

- Data: The current human pseudogene annotation is in GENCODE 21. .

- Description: The GENCODE annotation of pseudogenes contains models that have been created by the Human and Vertebrate Analysis and Annotation (HAVANA) team, an expert manual annotation team at the Wellcome Trust Sanger Institute. This is informed by, and checked against, computational pseudogene predictions by thePseudoPipe and RetroFinder pipelines.

PseudoPipe Output

- Data: The current PseudoPipe results are on Ensembl genome release 79. .

- Description: Genome-wide human pseudogene annotation predicted by PseudoPipe. PseudoPipe is a homology-based computational pipeline that searches a mammalian genome and identifies pseudogene sequences.

- Reference:

Other Human Pseudogene Sets

- Data: .

- Description: Archived pseudogene annotation on previous human genome releases from PseudoPipe. Genome-wide annotation or specific subset.

五 16

TCGA数据挖掘系列文章之-pseudogene假基因探究

Posted on 2016年5月16日 by ulwvfje

这是TCGA数据挖掘系列文章之一，是安德森癌症研究中心的Han Liang主导的，纯粹的生物信息学数据分析文章。

文章见：http://www.nature.com/ncomms/2014/140707/ncomms4963/full/ncomms4963.html

TCGA数据库的数据量现在已经非常可观了，一万多的肿瘤样本数据，关于假基因的这篇文章是2014年发的，所以他们只研究了2,808个样本数据，也只涉及到7个癌症种类。