<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; driver_genes</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/driver_genes/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>文献笔记-2010-R-softeware-identify-cancer_driver_genes</title>
		<link>http://www.bio-info-trainee.com/966.html</link>
		<comments>http://www.bio-info-trainee.com/966.html#comments</comments>
		<pubDate>Mon, 31 Aug 2015 05:33:26 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[driver_genes]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=966</guid>
		<description><![CDATA[我们用188 non-small cell lung tumors数据来测试了一 &#8230; <a href="http://www.bio-info-trainee.com/966.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我们用188 non-small cell lung tumors数据来测试了一个R语言程序，find driver genes in cancer ~<br />
软件地址如下：http://linus.nci.nih.gov/Data/YounA/software.zip<br />
这是一个R语言程序，里面有readme，用法很简单。<br />
准备好两个文件，分别是silent_mutation_table.txt and nonsilent_mutation_table.txt ，它们都是普通文本格式数据，内容如下，就是把找到的snp格式化，根据注释结果分成silent和nonsilent即可。<br />
#Ensembl_gene_id Chromosome Start_position Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Sample_Barcode<br />
#ENSG00000122477 1 100390656 SNP G G A TCGA-23-1022-01A-02W-0488-09<br />
然后直接运行程序包里面的主程序，在R语言里面source("main_R_script.r")<br />
We reanalyzed sequence data for 623 candidate genes in 188 non-small cell lung tumors using the new method.<br />
to identify genes that are frequently mutated and thereby are expected to have primary roles in thedevelopment of tumor<br />
To find these driver genes, each gene is tested for whether its mutation rate is significantly higher than the background (or passenger) mutation rate.</p>
<p>Some investigators (Sjoblom et al., 2006) further divide mutations into several types according to the nucleotide and the neighboring nucleotides of the mutations.</p>
<p>Ding et al. (2008)的方法的三个缺点：<br />
1、different types of mutations can have different impact on proteins.（越影响蛋白功能的突变，越有可能是driver mutation）<br />
2、different samples have different background mutation rates. （在高突变背景的样本中的突变，很可能是高突变背景的原因，而不是因为癌症）<br />
3、a different number of non-silent mutations can occur at each base pair according to the genetic code.（比如Tryptophan仅仅只有一个密码子，而arginine高达6个密码子）</p>
<p>我们提出的方法的4个优点是：<br />
1，我们对非同义突变根据它们对蛋白功能的影响进行了评级打分。<br />
2，我们允许不同的样品有着不同的BMR<br />
3，that whether the mutation is non-silent or silent depends on the genetic code<br />
4，we take into account uncertainties in the background mutation rate by using empirical Bayes methods</p>
<p>还有5个需要改进的地方：<br />
1，However, the functional impact is also dependent on the position in which a mutation occurs.（我们仅仅考虑了突变对氨基酸的改变）<br />
2，the current scoring system which assigns mutation scores in the order: missense mutation&lt;inframe indel&lt;mutation in splice sites&lt;frameshift indel=nonsense mutation may be biased toward identifying tumor suppressor genes over oncogenes.<br />
3，we may refine our background mutation model in Table 1 so that all six types of mutations, A:T→G:C, A:T→C:G, A: T→T :A,G:C→A:T, G:C→T :A, G:C→C:G have separate mutation rates.<br />
4，we did not take into account correlations among mutations in identifying driver genes.<br />
5，one might combine both copy number variation and sequencing data to identify driver genes.</p>
<p>HGNC定义的gene Symbol转为ensemble数据库的ID，的R语言代码：<br />
library(biomaRt)<br />
ensembl=useMart("ensembl",dataset = "hsapiens_gene_ensembl")<br />
all.gene.table = read.table("all_gene.symbol", header=F)<br />
convert=getBM(attributes = c("chromosome_name","ensembl_gene_id","hgnc_symbol"),filters =c("hgnc_symbol"),values=all.gene.table[,1],mart=ensembl)<br />
chromosome=c(1:22,"X","Y","M")<br />
convert=convert[!is.na(match(convert[,1],chromosome)),2:3] #remove names whose matching chromosome is not 1-22, X, or Y.<br />
convert=convert[rowSums(convert=="")==0,]<br />
write.table(convert,"ensembl2symbol.list",quote = F,row.names =F,col.names =F)<br />
write.table(convert,"all_gene_name.txt",quote = F,row.names =F,col.names =F)</p>
<p>一个gene Symbol可能对应着多个ensemble ID号，但是在每个染色体上面是一对一的关系。<br />
有些gene Symbol可能找不到ensemble ID号，一般情况是因为这个gene Symbol并不是纯粹的HGNC定义的，或者是比较陈旧的ID。<br />
比如下面的TIGAR ，就很可能被写作是C12orf5<br />
Aliases for TIGAR Gene<br />
TP53 Induced Glycolysis Regulatory Phosphatase 2 3<br />
TP53-Induced Glycolysis And Apoptosis Regulator 2 3 4<br />
C12orf5 3 4 6<br />
Probable Fructose-2,6-Bisphosphatase TIGAR 3<br />
Fructose-2,6-Bisphosphate 2-Phosphatase 3<br />
Chromosome 12 Open Reading Frame 5 2<br />
Fructose-2,6-Bisphosphatase TIGAR 3<br />
Transactivated By NS3TP2 Protein 3<br />
EC 3.1.3.46 4<br />
FR2BP 3<br />
External Ids for TIGAR Gene<br />
HGNC: 1185 Entrez Gene: 57103 Ensembl: ENSG00000078237 OMIM: 610775 UniProtKB: Q9NQ88<br />
Previous HGNC Symbols for TIGAR Gene<br />
C12orf5<br />
Export aliases for TIGAR gene to outside databases</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/966.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
