<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 基础数据格式</title>
	<atom:link href="http://www.bio-info-trainee.com/category/basic-bio-infomatics/data-format/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>生信技能树论坛-生信基础版块介绍-测序基础</title>
		<link>http://www.bio-info-trainee.com/2842.html</link>
		<comments>http://www.bio-info-trainee.com/2842.html#comments</comments>
		<pubDate>Thu, 16 Nov 2017 02:01:13 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2842</guid>
		<description><![CDATA[如果你是最近关注我们，你将又知道一个学习生信的好地方； 如果你是一直关注我们，你 &#8230; <a href="http://www.bio-info-trainee.com/2842.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div class="markdown-here-wrapper" data-md-url="http://www.bio-info-trainee.com/wp-admin/post-new.php">
<p style="margin: 0px 0px 1.2em !important;">如果你是最近关注我们，你将又知道一个学习生信的好地方；<br />
如果你是一直关注我们，你肯定对这个地方不陌生；<br />
那就是我们的<strong>生信技能树论坛</strong>。<br />
本周我们将为大家带来论坛-生信基础版块的介绍。</p>
<p style="margin: 0px 0px 1.2em !important; text-align: right;"><span style="color: #ff00ff;">作者：梅零落</span></p>
<p style="margin: 0px 0px 1.2em !important;"><span id="more-2842"></span><br />
首先是测序基础。<br />
关于我们论坛的介绍，创始人jimmy最近已经给大家做了很详细的说明，详情请看末尾的<strong>大写的真诚</strong>。让我们使用论坛的搜索，看看这个版块有些什么吧，这里对本版块的帖子简单分了类，所以可以试试在本版搜索以下关键词（如图）…看看有没有你想要的(#^.^#)<br />
<img src="https://i.gyazo.com/2a1ddcef9f749a8b84bf15bf7c63d76e.png" alt="" /></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">关键词：生信人</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-87-1-5.html">生信人必须了解的公共数据库API-持续收集~~~</a><br />
2.<a href="http://www.biotrainee.com/thread-119-1-5.html">生信人必须了解的常见ftp资源中心</a><br />
3.<a href="http://www.biotrainee.com/thread-42-1-5.html">生信人必会数据格式持续收集</a><br />
4.<a href="http://www.biotrainee.com/thread-120-1-5.html">生信人的浏览器网页收藏夹交流</a><br />
5.<a href="http://www.biotrainee.com/thread-161-1-5.html">生信人必会软件之genome brower</a><br />
6.<a href="http://www.biotrainee.com/thread-153-1-3.html">生信人必须会的软件之bedtools</a><br />
7.<a href="http://www.biotrainee.com/thread-154-1-2.html">生信人必须会的软件之samtools</a><br />
8.<a href="http://www.biotrainee.com/thread-1523-1-2.html">一个生信人拿到自己的服务器必须做的十件事</a><br />
9.<a href="http://www.biotrainee.com/thread-41-1-1.html">生信人必会经典数据库NCBI-UCSC-ENSEMBL~~~</a><br />
10.<a href="http://www.biotrainee.com/thread-43-1-1.html">生信人必须了解的各种ID表示方式</a></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">关键词：数据库</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-253-1-5.html">基因表达数据库</a><br />
2.<a href="http://www.biotrainee.com/thread-292-1-4.html">生物信息公共数据ftp链接大全</a><br />
3.<a href="http://www.biotrainee.com/thread-810-1-4.html">教你提交Affymetrix芯片数据到GEO数据库</a><br />
4.<a href="http://www.biotrainee.com/thread-1041-1-4.html">HPO数据库下载并且解析</a><br />
5.<a href="http://www.biotrainee.com/thread-991-1-4.html">clinvar数据库详解</a><br />
6.<a href="http://www.biotrainee.com/thread-1087-1-3.html">GTEx数据库应用之尼安德特人基因仍影响现代人</a><br />
7.<a href="http://www.biotrainee.com/thread-1151-1-3.html">想深度了解一个数据库或者project需要读文献</a><br />
8.<a href="http://www.biotrainee.com/thread-1152-1-3.html">想深度了解一个数据库或project可以看FAQ</a><br />
9.<a href="http://www.biotrainee.com/thread-1212-1-3.html">Pfam数据库蛋白编码能力预测</a><br />
10.<a href="http://www.biotrainee.com/thread-1229-1-3.html">genecard数据库里面涵盖的数据库</a><br />
11.<a href="http://www.biotrainee.com/thread-411-1-3.html">常用数据库ID表示方式</a><br />
12.<a href="http://www.biotrainee.com/thread-1669-1-2.html">代谢通路数据库汇总</a><br />
13.<a href="http://www.biotrainee.com/thread-1505-1-2.html">NCBI的SRA数据库里面所有样本的描述信息</a><br />
14.<a href="http://www.biotrainee.com/thread-2091-1-1.html">TCIA数据库介绍</a><br />
15.<a href="http://www.biotrainee.com/thread-1477-1-2.html">给生信工程师的cosmic数据库教程</a><br />
16.<a href="http://www.biotrainee.com/thread-40-1-1.html">遗传资源变异数据库持续收集</a><br />
17.<a href="http://www.biotrainee.com/thread-2186-1-1.html">人类细胞应答生物学网络数据库</a></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">关键词：测序</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-294-1-5.html">一二三代测序原理解读+视频</a><br />
2.<a href="http://www.biotrainee.com/thread-324-1-4.html">要充分了解你的测序数据—论QC的重要性</a><br />
3.<a href="http://www.biotrainee.com/thread-932-1-4.html">illumina测序接头的获取</a><br />
4.<a href="http://www.biotrainee.com/thread-1007-1-4.html">一个详细探究测序质量为什么失败的网站</a><br />
5.<a href="http://www.biotrainee.com/thread-1002-1-4.html">sanger测序作为测序界的金标准，还是需要会的</a><br />
6.<a href="http://www.biotrainee.com/thread-938-1-3.html">测序仪比较之Miseq Vs PGM</a><br />
7.<a href="http://www.biotrainee.com/thread-289-1-3.html">测序基本名词解释</a><br />
8.<a href="http://www.biotrainee.com/thread-1515-1-2.html">hi-C 测序数据的分析，看这几篇文章吧</a><br />
9.<a href="http://www.biotrainee.com/thread-1694-1-2.html">上传高通量测序原始文件</a><br />
10.<a href="http://www.biotrainee.com/thread-2164-1-1.html">Cutadapt对测序数据质控</a><br />
11.<a href="http://www.biotrainee.com/thread-617-1-1.html">新一代测序技术综述</a><br />
12.<a href="http://www.biotrainee.com/thread-2174-1-1.html">测序中的insert size</a><br />
13.<a href="http://www.biotrainee.com/thread-1544-1-2.html">Hiseq 4000 recipe (测序原理）</a></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">关键词：基因</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-225-1-5.html">你的基因名称写对了吗</a><br />
2.<a href="http://www.biotrainee.com/thread-661-1-4.html">不同基因组版本坐标转换大全~biostar经典帖子</a><br />
3.<a href="http://www.biotrainee.com/thread-511-1-4.html">gene symbol 中的奇怪开头基因</a><br />
4.<a href="http://www.biotrainee.com/thread-882-1-4.html">千人基因组计划</a><br />
5.<a href="http://www.biotrainee.com/thread-427-1-4.html">IGB和IGV的区别-关于基因组起始坐标</a><br />
6.<a href="http://www.biotrainee.com/thread-1010-1-4.html">NCBI的参考基因组计划</a><br />
7.<a href="http://www.biotrainee.com/thread-1067-1-4.html">BodyMap和GTEx可以用来搜索一个基因在正常组织的表达量数值</a><br />
8.<a href="http://www.biotrainee.com/thread-1424-1-3.html">千人基因组计划使用的参考基因组</a><br />
9.<a href="http://www.biotrainee.com/thread-1425-1-3.html">全球基因组学与卫生联盟（GA4GH）</a><br />
10.<a href="http://www.biotrainee.com/thread-1426-1-3.html">谷歌宣布加入世界基因组学与卫生联盟</a><br />
11.<a href="http://www.biotrainee.com/thread-1427-1-2.html">全球蚂蚁基因组学联盟（The Global Ant Genomics Alliance, GAGA</a><br />
12.<a href="http://www.biotrainee.com/thread-1436-1-2.html">基因表达谱矩阵中筛选全部的LncRNA</a><br />
13.<a href="http://www.biotrainee.com/thread-1547-1-2.html">你真正了解小鼠基因组吗？Mus musculus GRCm38.p5</a><br />
14.<a href="http://www.biotrainee.com/thread-1700-1-2.html">用bedtools对基因组片段区域进行基因注释</a><br />
15.<a href="http://www.biotrainee.com/thread-2254-1-1.html">三维基因组学简史</a><br />
16.<a href="http://www.biotrainee.com/thread-857-1-1.html">生物信息学常见的数据下载，包括基因组，gtf，bed，注释</a></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">关键词：格式</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-171-1-5.html">Pileup 格式</a><br />
2.<a href="http://www.biotrainee.com/thread-412-1-5.html">Edit Distance编辑距离（NM tag）- sam/bam格式解读进阶</a><br />
3.<a href="http://www.biotrainee.com/thread-1050-1-3.html">生物信息学格式学习笔记</a><br />
4.<a href="http://www.biotrainee.com/thread-1153-1-3.html">ucsc收集整理的常见生物信息学文件 格式</a><br />
5.<a href="http://www.biotrainee.com/thread-1934-1-1.html">《SAM文件格式说明（中文翻译）》</a></p>
<h3 id="-ngs" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">关键词：NGS</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-1923-1-1.html">NGS基础 - FASTQ格式解释和质量评估</a><br />
2.<a href="http://www.biotrainee.com/thread-1484-1-1.html">NGS 数据过滤之 Trimmomatic 详细说明</a><br />
3.<a href="http://www.biotrainee.com/thread-1382-1-1.html">NGS数据的Duplication问题</a></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">其他（未分类）</h3>
<p style="margin: 0px 0px 1.2em !important;">1.<a href="http://www.biotrainee.com/thread-1598-1-1.html">sam文件的tag多如牛毛，几人真心搞懂了？</a><br />
2.<a href="http://www.biotrainee.com/thread-555-1-4.html">你真的了解AAchange的注释信息吗？</a><br />
3.<a href="http://www.biotrainee.com/thread-1592-1-2.html">你永远无法知道你的傻X用户会给什么样的input给你的程序。</a><br />
4.<a href="http://www.biotrainee.com/thread-1115-1-3.html">怎么获取unique mapping read？</a><br />
5.<a href="http://www.biotrainee.com/thread-1173-1-3.html">怎样从Bam文件中取出某特定位点所匹配上的read名称或序列？</a><br />
6.<a href="http://www.biotrainee.com/thread-1803-1-1.html">怎么用TCGA的数据做ROC曲线</a><br />
7.<a href="http://www.biotrainee.com/thread-1126-1-3.html">如何利用seqtk模拟降低测序深度</a><br />
8.<a href="http://www.biotrainee.com/thread-1626-1-2.html">GATK的re-alignment 步骤需要吗？</a><br />
9.<a href="http://www.biotrainee.com/thread-1662-1-2.html">GATK的测试数据</a><br />
10.<a href="http://www.biotrainee.com/thread-1498-1-2.html">IPA软件介绍-Ingenuity Pathway Analysis详细对比介绍</a><br />
11.<a href="http://www.biotrainee.com/thread-1499-1-2.html">IPA软件的pathway跟kegg的有什么区别呢？</a><br />
12.<a href="http://www.biotrainee.com/thread-1618-1-2.html">这个数据集大家可能用得着-RNA-seq of 675 commonly used human cancer cell lines</a><br />
13.<a href="http://www.biotrainee.com/thread-100-1-5.html">ontonoly的概念及大全</a><br />
14.<a href="http://www.biotrainee.com/thread-862-1-1.html">ID转换大全</a><br />
15.<a href="http://www.biotrainee.com/thread-213-1-5.html">refseq简介</a><br />
16.<a href="http://www.biotrainee.com/thread-280-1-5.html">使用ASPERA更高效地下载SRA-fastq数据</a><br />
17.<a href="http://www.biotrainee.com/thread-403-1-5.html">miRBase小史</a><br />
18.<a href="http://www.biotrainee.com/thread-806-1-4.html">把fasta序列读入到R里面去~</a><br />
19.<a href="http://www.biotrainee.com/thread-896-1-4.html">ICGC介绍-International Cancer Genome Consortium</a><br />
20.<a href="http://www.biotrainee.com/thread-959-1-4.html">ENCODE计划知识集</a><br />
21.<a href="http://www.biotrainee.com/thread-1031-1-4.html">GTEx项目检测遗传学效应</a><br />
22.<a href="http://www.biotrainee.com/thread-1222-1-3.html">一些分子标记可以直接对应植物的性状</a><br />
23.<a href="http://www.biotrainee.com/thread-800-1-3.html">NCBI的SRA数据结构</a><br />
24.<a href="http://www.biotrainee.com/thread-1411-1-3.html">区间注释神器bedtools</a><br />
25.<a href="http://www.biotrainee.com/thread-1566-1-2.html">deeptools的用法</a><br />
26.<a href="http://www.biotrainee.com/thread-1580-1-2.html">VCF文件里面比较重要的头 tag-欢迎补充</a><br />
27.<a href="http://www.biotrainee.com/thread-847-1-1.html">mac装wget</a><br />
28.<a href="http://www.biotrainee.com/thread-1150-1-1.html">Genomic region black lists 这个概念很重要</a><br />
29.<a href="http://www.biotrainee.com/thread-2187-1-1.html">网络的细胞特性图书馆-LINCS</a></p>
<h3 id="-" style="margin: 1.3em 0px 1em; padding: 0px; font-weight: bold; font-size: 1.3em;">大写的真诚</h3>
<p style="margin: 0px 0px 1.2em !important;">我们非常欢迎有能力、有担当的朋友来参与论坛的建设，有意的朋友请联系我们。<br />
<a href="http://mp.weixin.qq.com/s/QrsPkASgyKwVB6n9qemFLQ">我如何参与生信技能树论坛建设</a><br />
<a href="https://mp.weixin.qq.com/s/BSODkbRQ2kFp60zoUutxFg">如何做好一个版主-系统性的整理一个领域的资料</a></p>
<div style="height: 0; width: 0; max-height: 0; max-width: 0; overflow: hidden; font-size: 0em; padding: 0; margin: 0;" title="MDH:PHA+5aaC5p6c5L2g5piv5pyA6L+R5YWz5rOo5oiR5Lus77yM5L2g5bCG5Y+I55+l6YGT5LiA5Liq
5a2m5Lmg55Sf5L+h55qE5aW95Zyw5pa577ybPGJyPjwvcD48cD7lpoLmnpzkvaDmmK/kuIDnm7Tl
hbPms6jmiJHku6zvvIzkvaDogq/lrprlr7nov5nkuKrlnLDmlrnkuI3pmYznlJ/vvJs8L3A+PHA+
6YKj5bCx5piv5oiR5Lus55qEKirnlJ/kv6HmioDog73moJHorrrlnZsqKuOAgjwvcD48cD7mnKzl
kajmiJHku6zlsIbkuLrlpKflrrbluKbmnaXorrrlnZst55Sf5L+h5Z+656GA54mI5Z2X55qE5LuL
57uN44CCPC9wPjxwPummluWFiOaYr+a1i+W6j+WfuuehgOOAgjwvcD48cD7lhbPkuo7miJHku6zo
rrrlnZvnmoTku4vnu43vvIzliJvlp4vkurpqaW1teeacgOi/keW3sue7j+e7meWkp+WutuWBmuS6
huW+iOivpue7hueahOivtOaYju+8jOivpuaDheivt+eci+acq+WwvueahCoq5aSn5YaZ55qE55yf
6K+aKirjgILorqnmiJHku6zkvb/nlKjorrrlnZvnmoTmkJzntKLvvIznnIvnnIvov5nkuKrniYjl
nZfmnInkupvku4DkuYjlkKfvvIzov5nph4zlr7nmnKzniYjlnZfnmoTluJblrZDnroDljZXliIbk
uobnsbvvvIzmiYDku6Xlj6/ku6Xor5Xor5XlnKjmnKzniYjmkJzntKLku6XkuIvlhbPplK7or43v
vIjlpoLlm77vvIkuLi7nnIvnnIvmnInmsqHmnInkvaDmg7PopoHnmoQoI14uXiMpPC9wPjxwPiFb
XShodHRwczovL2kuZ3lhem8uY29tLzJhMWRkY2VmOWY3NDlhOGI4NGJmMTViZjdjNjNkNzZlLnBu
Zyk8L3A+PHA+IyMjIOWFs+mUruivje+8mueUn+S/oeS6ujwvcD48cD4xLlvnlJ/kv6Hkurrlv4Xp
obvkuobop6PnmoTlhazlhbHmlbDmja7lupNBUEkt5oyB57ut5pS26ZuGfn5+XShodHRwOi8vd3d3
LmJpb3RyYWluZWUuY29tL3RocmVhZC04Ny0xLTUuaHRtbCk8L3A+PHA+Mi5b55Sf5L+h5Lq65b+F
6aG75LqG6Kej55qE5bi46KeBZnRw6LWE5rqQ5Lit5b+DXShodHRwOi8vd3d3LmJpb3RyYWluZWUu
Y29tL3RocmVhZC0xMTktMS01Lmh0bWwpPC9wPjxwPjMuW+eUn+S/oeS6uuW/heS8muaVsOaNruag
vOW8j+aMgee7reaUtumbhl0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtNDItMS01
Lmh0bWwpPC9wPjxwPjQuW+eUn+S/oeS6uueahOa1j+iniOWZqOe9kemhteaUtuiXj+WkueS6pOa1
gV0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTIwLTEtNS5odG1sKTwvcD48cD41
LlvnlJ/kv6Hkurrlv4XkvJrova/ku7bkuYtnZW5vbWUgYnJvd2VyXShodHRwOi8vd3d3LmJpb3Ry
YWluZWUuY29tL3RocmVhZC0xNjEtMS01Lmh0bWwpPC9wPjxwPjYuW+eUn+S/oeS6uuW/hemhu+S8
mueahOi9r+S7tuS5i2JlZHRvb2xzXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0x
NTMtMS0zLmh0bWwpPC9wPjxwPjcuW+eUn+S/oeS6uuW/hemhu+S8mueahOi9r+S7tuS5i3NhbXRv
b2xzXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xNTQtMS0yLmh0bWwpPC9wPjxw
PjguW+S4gOS4queUn+S/oeS6uuaLv+WIsOiHquW3seeahOacjeWKoeWZqOW/hemhu+WBmueahOWN
geS7tuS6i10oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTUyMy0xLTIuaHRtbCk8
L3A+PHA+OS5b55Sf5L+h5Lq65b+F5Lya57uP5YW45pWw5o2u5bqTTkNCSS1VQ1NDLUVOU0VNQkx+
fn5dKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTQxLTEtMS5odG1sKTwvcD48cD4x
MC5b55Sf5L+h5Lq65b+F6aG75LqG6Kej55qE5ZCE56eNSUTooajnpLrmlrnlvI9dKGh0dHA6Ly93
d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTQzLTEtMS5odG1sKTwvcD48cD4jIyMg5YWz6ZSu6K+N
77ya5pWw5o2u5bqTPC9wPjxwPjEuW+WfuuWboOihqOi+vuaVsOaNruW6k10oaHR0cDovL3d3dy5i
aW90cmFpbmVlLmNvbS90aHJlYWQtMjUzLTEtNS5odG1sKTwvcD48cD4yLlvnlJ/niankv6Hmga/l
hazlhbHmlbDmja5mdHDpk77mjqXlpKflhahdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhy
ZWFkLTI5Mi0xLTQuaHRtbCk8L3A+PHA+My5b5pWZ5L2g5o+Q5LqkQWZmeW1ldHJpeOiKr+eJh+aV
sOaNruWIsEdFT+aVsOaNruW6k10oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtODEw
LTEtNC5odG1sKTwvcD48cD40LltIUE/mlbDmja7lupPkuIvovb3lubbkuJTop6PmnpBdKGh0dHA6
Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTEwNDEtMS00Lmh0bWwpPC9wPjxwPjUuW2NsaW52
YXLmlbDmja7lupPor6bop6NdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTk5MS0x
LTQuaHRtbCk8L3A+PHA+Ni5bR1RFeOaVsOaNruW6k+W6lOeUqOS5i+WwvOWuieW+t+eJueS6uuWf
uuWboOS7jeW9seWTjeeOsOS7o+S6ul0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQt
MTA4Ny0xLTMuaHRtbCk8L3A+PHA+Ny5b5oOz5rex5bqm5LqG6Kej5LiA5Liq5pWw5o2u5bqT5oiW
6ICFcHJvamVjdOmcgOimgeivu+aWh+eMrl0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJl
YWQtMTE1MS0xLTMuaHRtbCk8L3A+PHA+OC5b5oOz5rex5bqm5LqG6Kej5LiA5Liq5pWw5o2u5bqT
5oiWcHJvamVjdOWPr+S7peeci0ZBUV0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQt
MTE1Mi0xLTMuaHRtbCk8L3A+PHA+OS5bUGZhbeaVsOaNruW6k+ibi+eZvee8lueggeiDveWKm+mi
hOa1i10oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTIxMi0xLTMuaHRtbCk8L3A+
PHA+MTAuW2dlbmVjYXJk5pWw5o2u5bqT6YeM6Z2i5ra155uW55qE5pWw5o2u5bqTXShodHRwOi8v
d3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xMjI5LTEtMy5odG1sKTwvcD48cD4xMS5b5bi455So
5pWw5o2u5bqTSUTooajnpLrmlrnlvI9dKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFk
LTQxMS0xLTMuaHRtbCk8L3A+PHA+MTIuW+S7o+iwoumAmui3r+aVsOaNruW6k+axh+aAu10oaHR0
cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTY2OS0xLTIuaHRtbCk8L3A+PHA+MTMuW05D
QknnmoRTUkHmlbDmja7lupPph4zpnaLmiYDmnInmoLfmnKznmoTmj4/ov7Dkv6Hmga9dKGh0dHA6
Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTE1MDUtMS0yLmh0bWwpPC9wPjxwPjE0LltUQ0lB
5pWw5o2u5bqT5LuL57uNXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0yMDkxLTEt
MS5odG1sKTwvcD48cD4xNS5b57uZ55Sf5L+h5bel56iL5biI55qEY29zbWlj5pWw5o2u5bqT5pWZ
56iLXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xNDc3LTEtMi5odG1sKTwvcD48
cD4xNi5b6YGX5Lyg6LWE5rqQ5Y+Y5byC5pWw5o2u5bqT5oyB57ut5pS26ZuGXShodHRwOi8vd3d3
LmJpb3RyYWluZWUuY29tL3RocmVhZC00MC0xLTEuaHRtbCk8L3A+PHA+MTcuW+S6uuexu+e7huiD
nuW6lOetlOeUn+eJqeWtpue9kee7nOaVsOaNruW6k10oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNv
bS90aHJlYWQtMjE4Ni0xLTEuaHRtbCk8L3A+PHA+IyMjIOWFs+mUruivje+8mua1i+W6jzwvcD48
cD4xLlvkuIDkuozkuInku6PmtYvluo/ljp/nkIbop6Por7sr6KeG6aKRXShodHRwOi8vd3d3LmJp
b3RyYWluZWUuY29tL3RocmVhZC0yOTQtMS01Lmh0bWwpPC9wPjxwPjIuW+imgeWFheWIhuS6huin
o+S9oOeahOa1i+W6j+aVsOaNri0t6K66UUPnmoTph43opoHmgKddKGh0dHA6Ly93d3cuYmlvdHJh
aW5lZS5jb20vdGhyZWFkLTMyNC0xLTQuaHRtbCk8L3A+PHA+My5baWxsdW1pbmHmtYvluo/mjqXl
pLTnmoTojrflj5ZdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTkzMi0xLTQuaHRt
bCk8L3A+PHA+NC5b5LiA5Liq6K+m57uG5o6i56m25rWL5bqP6LSo6YeP5Li65LuA5LmI5aSx6LSl
55qE572R56uZXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xMDA3LTEtNC5odG1s
KTwvcD48cD41LltzYW5nZXLmtYvluo/kvZzkuLrmtYvluo/nlYznmoTph5HmoIflh4bvvIzov5jm
mK/pnIDopoHkvJrnmoRdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTEwMDItMS00
Lmh0bWwpPC9wPjxwPjYuW+a1i+W6j+S7quavlOi+g+S5i01pc2VxIFZzIFBHTV0oaHR0cDovL3d3
dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtOTM4LTEtMy5odG1sKTwvcD48cD43LlvmtYvluo/ln7rm
nKzlkI3or43op6Pph4pdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTI4OS0xLTMu
aHRtbCk8L3A+PHA+OC5baGktQyDmtYvluo/mlbDmja7nmoTliIbmnpDvvIznnIvov5nlh6Dnr4fm
lofnq6DlkKddKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTE1MTUtMS0yLmh0bWwp
PC9wPjxwPjkuW+S4iuS8oOmrmOmAmumHj+a1i+W6j+WOn+Wni+aWh+S7tl0oaHR0cDovL3d3dy5i
aW90cmFpbmVlLmNvbS90aHJlYWQtMTY5NC0xLTIuaHRtbCk8L3A+PHA+MTAuW0N1dGFkYXB05a+5
5rWL5bqP5pWw5o2u6LSo5o6nXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0yMTY0
LTEtMS5odG1sKTwvcD48cD4xMS5b5paw5LiA5Luj5rWL5bqP5oqA5pyv57u86L+wXShodHRwOi8v
d3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC02MTctMS0xLmh0bWwpPC9wPjxwPjEyLlvmtYvluo/k
uK3nmoRpbnNlcnQgc2l6ZV0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMjE3NC0x
LTEuaHRtbCk8L3A+PHA+MTMuW0hpc2VxIDQwMDAgcmVjaXBlICjmtYvluo/ljp/nkIbvvIldKGh0
dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTE1NDQtMS0yLmh0bWwpPC9wPjxwPiMjIyDl
hbPplK7or43vvJrln7rlm6A8L3A+PHA+MS5b5L2g55qE5Z+65Zug5ZCN56ew5YaZ5a+55LqG5ZCX
XShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0yMjUtMS01Lmh0bWwpPC9wPjxwPjIu
W+S4jeWQjOWfuuWboOe7hOeJiOacrOWdkOagh+i9rOaNouWkp+WFqH5iaW9zdGFy57uP5YW45biW
5a2QXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC02NjEtMS00Lmh0bWwpPC9wPjxw
PjMuW2dlbmUgc3ltYm9sIOS4reeahOWlh+aAquW8gOWktOWfuuWboF0oaHR0cDovL3d3dy5iaW90
cmFpbmVlLmNvbS90aHJlYWQtNTExLTEtNC5odG1sKTwvcD48cD40LlvljYPkurrln7rlm6Dnu4To
rqHliJJdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTg4Mi0xLTQuaHRtbCk8L3A+
PHA+NS5bSUdC5ZKMSUdW55qE5Yy65YirLeWFs+S6juWfuuWboOe7hOi1t+Wni+WdkOagh10oaHR0
cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtNDI3LTEtNC5odG1sKTwvcD48cD42LltOQ0JJ
55qE5Y+C6ICD5Z+65Zug57uE6K6h5YiSXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVh
ZC0xMDEwLTEtNC5odG1sKTwvcD48cD43LltCb2R5TWFw5ZKMR1RFeOWPr+S7peeUqOadpeaQnOe0
ouS4gOS4quWfuuWboOWcqOato+W4uOe7hOe7h+eahOihqOi+vumHj+aVsOWAvF0oaHR0cDovL3d3
dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTA2Ny0xLTQuaHRtbCk8L3A+PHA+OC5b5Y2D5Lq65Z+6
5Zug57uE6K6h5YiS5L2/55So55qE5Y+C6ICD5Z+65Zug57uEXShodHRwOi8vd3d3LmJpb3RyYWlu
ZWUuY29tL3RocmVhZC0xNDI0LTEtMy5odG1sKTwvcD48cD45LlvlhajnkIPln7rlm6Dnu4Tlrabk
uI7ljavnlJ/ogZTnm5/vvIhHQTRHSO+8iV0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJl
YWQtMTQyNS0xLTMuaHRtbCk8L3A+PHA+MTAuW+iwt+atjOWuo+W4g+WKoOWFpeS4lueVjOWfuuWb
oOe7hOWtpuS4juWNq+eUn+iBlOebn10oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQt
MTQyNi0xLTMuaHRtbCk8L3A+PHA+MTEuW+WFqOeQg+iaguiageWfuuWboOe7hOWtpuiBlOebn++8
iFRoZSBHbG9iYWwgQW50IEdlbm9taWNzIEFsbGlhbmNlLCBHQUdBXShodHRwOi8vd3d3LmJpb3Ry
YWluZWUuY29tL3RocmVhZC0xNDI3LTEtMi5odG1sKTwvcD48cD4xMi5b5Z+65Zug6KGo6L6+6LCx
55+p6Zi15Lit562b6YCJ5YWo6YOo55qETG5jUk5BXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29t
L3RocmVhZC0xNDM2LTEtMi5odG1sKTwvcD48cD4xMy5b5L2g55yf5q2j5LqG6Kej5bCP6byg5Z+6
5Zug57uE5ZCX77yfTXVzIG11c2N1bHVzIEdSQ20zOC5wNV0oaHR0cDovL3d3dy5iaW90cmFpbmVl
LmNvbS90aHJlYWQtMTU0Ny0xLTIuaHRtbCk8L3A+PHA+MTQuW+eUqGJlZHRvb2xz5a+55Z+65Zug
57uE54mH5q615Yy65Z+f6L+b6KGM5Z+65Zug5rOo6YeKXShodHRwOi8vd3d3LmJpb3RyYWluZWUu
Y29tL3RocmVhZC0xNzAwLTEtMi5odG1sKTwvcD48cD4xNS5b5LiJ57u05Z+65Zug57uE5a2m566A
5Y+yXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0yMjU0LTEtMS5odG1sKTwvcD48
cD4xNi5b55Sf54mp5L+h5oGv5a2m5bi46KeB55qE5pWw5o2u5LiL6L2977yM5YyF5ous5Z+65Zug
57uE77yMZ3Rm77yMYmVk77yM5rOo6YeKXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVh
ZC04NTctMS0xLmh0bWwpPC9wPjxwPiMjIyDlhbPplK7or43vvJrmoLzlvI88L3A+PHA+MS5bUGls
ZXVwIOagvOW8j10oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTcxLTEtNS5odG1s
KTwvcD48cD4yLltFZGl0IERpc3RhbmNl57yW6L6R6Led56a777yITk0gdGFn77yJLSBzYW0vYmFt
5qC85byP6Kej6K+76L+b6Zi2XShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC00MTIt
MS01Lmh0bWwpPC9wPjxwPjMuW+eUn+eJqeS/oeaBr+WtpuagvOW8j+WtpuS5oOeslOiusF0oaHR0
cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTA1MC0xLTMuaHRtbCk8L3A+PHA+NC5bdWNz
Y+aUtumbhuaVtOeQhueahOW4uOingeeUn+eJqeS/oeaBr+WtpuaWh+S7tiDmoLzlvI9dKGh0dHA6
Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTExNTMtMS0zLmh0bWwpPC9wPjxwPjUuW+OAilNB
TeaWh+S7tuagvOW8j+ivtOaYju+8iOS4reaWh+e/u+ivke+8ieOAi10oaHR0cDovL3d3dy5iaW90
cmFpbmVlLmNvbS90aHJlYWQtMTkzNC0xLTEuaHRtbCk8L3A+PHA+IyMjIOWFs+mUruivje+8mk5H
UzwvcD48cD4xLltOR1Pln7rnoYAgLSBGQVNUUeagvOW8j+ino+mHiuWSjOi0qOmHj+ivhOS8sF0o
aHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTkyMy0xLTEuaHRtbCk8L3A+PHA+Mi5b
TkdTIOaVsOaNrui/h+a7pOS5iyBUcmltbW9tYXRpYyDor6bnu4bor7TmmI5dKGh0dHA6Ly93d3cu
YmlvdHJhaW5lZS5jb20vdGhyZWFkLTE0ODQtMS0xLmh0bWwpPC9wPjxwPjMuW05HU+aVsOaNruea
hER1cGxpY2F0aW9u6Zeu6aKYXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xMzgy
LTEtMS5odG1sKTwvcD48cD4jIyMg5YW25LuW77yI5pyq5YiG57G777yJPC9wPjxwPjEuW3NhbeaW
h+S7tueahHRhZ+WkmuWmgueJm+avm++8jOWHoOS6uuecn+W/g+aQnuaHguS6hu+8n10oaHR0cDov
L3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTU5OC0xLTEuaHRtbCk8L3A+PHA+Mi5b5L2g55yf
55qE5LqG6KejQUFjaGFuZ2XnmoTms6jph4rkv6Hmga/lkJfvvJ9dKGh0dHA6Ly93d3cuYmlvdHJh
aW5lZS5jb20vdGhyZWFkLTU1NS0xLTQuaHRtbCk8L3A+PHA+My5b5L2g5rC46L+c5peg5rOV55+l
6YGT5L2g55qE5YK7WOeUqOaIt+S8mue7meS7gOS5iOagt+eahGlucHV057uZ5L2g55qE56iL5bqP
44CCXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xNTkyLTEtMi5odG1sKTwvcD48
cD40LlvmgI7kuYjojrflj5Z1bmlxdWUgbWFwcGluZyByZWFk77yfXShodHRwOi8vd3d3LmJpb3Ry
YWluZWUuY29tL3RocmVhZC0xMTE1LTEtMy5odG1sKTwvcD48cD41LlvmgI7moLfku45CYW3mlofk
u7bkuK3lj5blh7rmn5DnibnlrprkvY3ngrnmiYDljLnphY3kuIrnmoRyZWFk5ZCN56ew5oiW5bqP
5YiX77yfXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xMTczLTEtMy5odG1sKTwv
cD48cD42LlvmgI7kuYjnlKhUQ0dB55qE5pWw5o2u5YGaUk9D5puy57q/XShodHRwOi8vd3d3LmJp
b3RyYWluZWUuY29tL3RocmVhZC0xODAzLTEtMS5odG1sKTwvcD48cD43LlvlpoLkvZXliKnnlKhz
ZXF0a+aooeaLn+mZjeS9jua1i+W6j+a3seW6pl0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90
aHJlYWQtMTEyNi0xLTMuaHRtbCk8L3A+PHA+OC5bR0FUS+eahHJlLWFsaWdubWVudCDmraXpqqTp
nIDopoHlkJfvvJ9dKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTE2MjYtMS0yLmh0
bWwpPC9wPjxwPjkuW0dBVEvnmoTmtYvor5XmlbDmja5dKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5j
b20vdGhyZWFkLTE2NjItMS0yLmh0bWwpPC9wPjxwPjEwLltJUEHova/ku7bku4vnu40tSW5nZW51
aXR5IFBhdGh3YXkgQW5hbHlzaXPor6bnu4blr7nmr5Tku4vnu41dKGh0dHA6Ly93d3cuYmlvdHJh
aW5lZS5jb20vdGhyZWFkLTE0OTgtMS0yLmh0bWwpPC9wPjxwPjExLltJUEHova/ku7bnmoRwYXRo
d2F56Lefa2VnZ+eahOacieS7gOS5iOWMuuWIq+WRou+8n10oaHR0cDovL3d3dy5iaW90cmFpbmVl
LmNvbS90aHJlYWQtMTQ5OS0xLTIuaHRtbCk8L3A+PHA+MTIuW+i/meS4quaVsOaNrumbhuWkp+Wu
tuWPr+iDveeUqOW+l+edgC1STkEtc2VxIG9mIDY3NSBjb21tb25seSB1c2VkIGh1bWFuIGNhbmNl
ciBjZWxsIGxpbmVzXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xNjE4LTEtMi5o
dG1sKTwvcD48cD4xMy5bb250b25vbHnnmoTmpoLlv7Xlj4rlpKflhahdKGh0dHA6Ly93d3cuYmlv
dHJhaW5lZS5jb20vdGhyZWFkLTEwMC0xLTUuaHRtbCk8L3A+PHA+MTQuW0lE6L2s5o2i5aSn5YWo
XShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC04NjItMS0xLmh0bWwpPC9wPjxwPjE1
LltyZWZzZXHnroDku4tdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTIxMy0xLTUu
aHRtbCk8L3A+PHA+MTYuW+S9v+eUqEFTUEVSQeabtOmrmOaViOWcsOS4i+i9vVNSQS1mYXN0ceaV
sOaNrl0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMjgwLTEtNS5odG1sKTwvcD48
cD4xNy5bbWlSQmFzZeWwj+WPsl0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtNDAz
LTEtNS5odG1sKTwvcD48cD4xOC5b5oqKZmFzdGHluo/liJfor7vlhaXliLBS6YeM6Z2i5Y67fl0o
aHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtODA2LTEtNC5odG1sKTwvcD48cD4xOS5b
SUNHQ+S7i+e7jS1JbnRlcm5hdGlvbmFsIENhbmNlciBHZW5vbWUgQ29uc29ydGl1bV0oaHR0cDov
L3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtODk2LTEtNC5odG1sKTwvcD48cD4yMC5bRU5DT0RF
6K6h5YiS55+l6K+G6ZuGXShodHRwOi8vd3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC05NTktMS00
Lmh0bWwpPC9wPjxwPjIxLltHVEV46aG555uu5qOA5rWL6YGX5Lyg5a2m5pWI5bqUXShodHRwOi8v
d3d3LmJpb3RyYWluZWUuY29tL3RocmVhZC0xMDMxLTEtNC5odG1sKTwvcD48cD4yMi5b5LiA5Lqb
5YiG5a2Q5qCH6K6w5Y+v5Lul55u05o6l5a+55bqU5qSN54mp55qE5oCn54q2XShodHRwOi8vd3d3
LmJpb3RyYWluZWUuY29tL3RocmVhZC0xMjIyLTEtMy5odG1sKTwvcD48cD4yMy5bTkNCSeeahFNS
QeaVsOaNrue7k+aehF0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtODAwLTEtMy5o
dG1sKTwvcD48cD4yNC5b5Yy66Ze05rOo6YeK56We5ZmoYmVkdG9vbHNdKGh0dHA6Ly93d3cuYmlv
dHJhaW5lZS5jb20vdGhyZWFkLTE0MTEtMS0zLmh0bWwpPC9wPjxwPjI1LltkZWVwdG9vbHPnmoTn
lKjms5VdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTE1NjYtMS0yLmh0bWwpPC9w
PjxwPjI2LltWQ0bmlofku7bph4zpnaLmr5TovoPph43opoHnmoTlpLQgdGFnLeasoui/juihpeWF
hV0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTU4MC0xLTIuaHRtbCk8L3A+PHA+
MjcuW21hY+ijhXdnZXRdKGh0dHA6Ly93d3cuYmlvdHJhaW5lZS5jb20vdGhyZWFkLTg0Ny0xLTEu
aHRtbCk8L3A+PHA+MjguW0dlbm9taWMgcmVnaW9uIGJsYWNrIGxpc3RzIOi/meS4quamguW/teW+
iOmHjeimgV0oaHR0cDovL3d3dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMTE1MC0xLTEuaHRtbCk8
L3A+PHA+MjkuW+e9kee7nOeahOe7huiDnueJueaAp+WbvuS5pummhi1MSU5DU10oaHR0cDovL3d3
dy5iaW90cmFpbmVlLmNvbS90aHJlYWQtMjE4Ny0xLTEuaHRtbCk8L3A+PHA+IyMjIOWkp+WGmeea
hOecn+ivmjwvcD48cD7miJHku6zpnZ7luLjmrKLov47mnInog73lipvjgIHmnInmi4XlvZPnmoTm
nIvlj4vmnaXlj4LkuI7orrrlnZvnmoTlu7rorr7vvIzmnInmhI/nmoTmnIvlj4vor7fogZTns7vm
iJHku6zjgII8L3A+PHA+W+aIkeWmguS9leWPguS4jueUn+S/oeaKgOiDveagkeiuuuWdm+W7uuiu
vl0oaHR0cDovL21wLndlaXhpbi5xcS5jb20vcy9RcnNQa0FTZ3lLd1ZCNm45cWVtRkxRKTwvcD48
cD5b5aaC5L2V5YGa5aW95LiA5Liq54mI5Li7Leezu+e7n+aAp+eahOaVtOeQhuS4gOS4qumihuWf
n+eahOi1hOaWmV0oaHR0cHM6Ly9tcC53ZWl4aW4ucXEuY29tL3MvQlNPRGtiUlEya0ZwNjB6b1V1
dHhGZyk8L3A+">​</div>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2842.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>制作自己的gene set文件给gsea软件</title>
		<link>http://www.bio-info-trainee.com/2144.html</link>
		<comments>http://www.bio-info-trainee.com/2144.html#comments</comments>
		<pubDate>Thu, 15 Dec 2016 11:43:56 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[基础软件]]></category>
		<category><![CDATA[geneset]]></category>
		<category><![CDATA[GMT]]></category>
		<category><![CDATA[GSEA]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2144</guid>
		<description><![CDATA[熟悉GSEA软件的都知道，它只需要GCT,CLS和GMT文件，其中GMT文件，G &#8230; <a href="http://www.bio-info-trainee.com/2144.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>熟悉GSEA软件的都知道，它只需要GCT,CLS和GMT文件，其中GMT文件，GSEA的作者已经给出了一大堆！就是记录broad的<a href="http://software.broadinstitute.org/gsea/msigdb/collections.jsp">Molecular Signatures Database (MSigDB) </a>已经收到了18026个geneset，<span style="color: #ff00ff;"><strong>但是我奇怪的是里面竟然没有包括cancer testis的gene set，MSigDB的确是多，但未必全，其实里面还有很多重复。而且有不少几乎没有意义的gene set。</strong></span>那我想做自己的gene set来用gsea软件做分析，就需要自己制造gmt格式的数据。因为即使下载了MSigDB的gene set，本质上就是gmt格式的数据而已：<a href="http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29">http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29</a><span id="more-2144"></span></p>
<div><img src="C:\Users\jimmy1314\AppData\Local\YNote\data\jmzeng1314@163.com\d248f30a00954d078e9ccb7b485f0c6c\clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="1421CC66B794477C8577DABCCA491669" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/41.png"><img class="alignnone size-full wp-image-2145" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/41.png" alt="4" width="937" height="615" /></a></div>
<div>我们首先要拿到自己感兴趣的gene set里面的gene list，最好是以hugo规定的标准symbol。</div>
<div>比如我感兴趣的是 ：<a href="http://www.cta.lncc.br/modelo.php">http://www.cta.lncc.br/modelo.php</a></div>
<div>我这里提供一个2列的文件，直接转换成gmt的R代码！</div>
<div>
<div>文件来自于：<a href="http://www.bio-info-trainee.com/1188.html">下载最新版的KEGG信息，并且解析好</a>，如下：</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/4b709b96ce244dcaad788d8a71e8a8ef/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="5C955ADB038545608FBEC81072EE8201" /><img class="alignnone" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/12/image004.png" alt="" width="745" height="326" /></div>
<div>首先在R里面赋值一个变量path2gene_file就是图中的kegg2gene.txt文件，读到R里面去</div>
<div>tmp=read.table(path2gene_file,sep="\t",colClasses=c('character'))</div>
<div>#tmp=toTable(org.Hs.egPATH)</div>
<div># first column is kegg ID, second column is entrez ID</div>
<div>GeneID2kegg_list&lt;&lt;- tapply(tmp[,1],as.factor(tmp[,2]),function(x) x)</div>
<div>kegg2GeneID_list&lt;&lt;- tapply(tmp[,2],as.factor(tmp[,1]),function(x) x)</div>
<div>这个变量kegg2GeneID_list是一个list，因为是entrez gene ID，需要转换成symbol，我就不多说了，转换后的数据，就是kegg2symbol_list 。</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/b98ac452e2a34f39946b3048bccc7d32/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="2E7838E03E8F44EAAB168B5F42FAB6CC" /></div>
<div>最后对 kegg2symbol_list 输出成gmt文件：</div>
<div>
<blockquote>
<div>write.gmt &lt;- function(geneSet=kegg2symbol_list,gmt_file='kegg2symbol.gmt'){</div>
<div></div>
<div>sink( gmt_file )</div>
<div>for (i in 1:length(geneSet)){</div>
<div>cat(names(geneSet)[i])</div>
<div>cat('\tNA\t')</div>
<div>cat(paste(geneSet[[i]],collapse = '\t'))</div>
<div>cat('\n')</div>
<div></div>
<div>}</div>
<div></div>
<div>sink()</div>
<div></div>
<div>}</div>
</blockquote>
</div>
</div>
<div><img class="alignnone size-full wp-image-2146" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/5.png" alt="5" width="555" height="562" /></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2144.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>gene symbol 中的奇怪开头基因</title>
		<link>http://www.bio-info-trainee.com/2129.html</link>
		<comments>http://www.bio-info-trainee.com/2129.html#comments</comments>
		<pubDate>Sun, 11 Dec 2016 00:48:20 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[entrez ID]]></category>
		<category><![CDATA[symbol]]></category>
		<category><![CDATA[基因对应]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2129</guid>
		<description><![CDATA[这本是我为论坛的基础板块写的一个基础知识点，但是浏览量实在有限，不忍它蒙尘，特在 &#8230; <a href="http://www.bio-info-trainee.com/2129.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这本是我为论坛的基础板块写的一个基础知识点，但是浏览量实在有限，不忍它蒙尘，特在博客重新发布一次！原帖见：<a href="http://www.biotrainee.com/thread-511-1-1.html" target="_blank">http://www.biotrainee.com/thread-511-1-1.html</a></p>
<p>gene symbol 是非常官方的，由HUGO 组织负责维护，有专门的数据库HGNC database of human gene names | HUGO<br />
以前分析数据的时候，有一些基因的symbol很奇怪，让我百思不得其解，比如<br />
C orf 系列基因，<br />
HS.系列基因，<br />
KRTAP系列基因，<br />
LOC系列基因，<br />
MIR系列基因，<br />
LINC系列基因<br />
它们往往一个系列，就有好几百个基因；<br />
C12orf44; Chromosome 12 Open Reading Frame 44;  这个是C orf系列基因的意思<br />
MIR系列基因应该是 miRNA相关的基因<br />
LINC系列基因应该就是long intergenic non-protein coding RNA<br />
LOC系列基因，是非正式的，推定的，日后可能被更合适的名字替代<br />
我这里做好了所有的基因对应关系，去生信菜鸟团QQ群里下载吧，共47938个基因的symbol和entrez gene id还有name，还有alias的对应!</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/12.png"><img class="alignnone size-full wp-image-2130" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/12/12.png" alt="1" width="535" height="450" /></a><br />
还有一些RNA基因，根本就没有symbol，比如：CTA/B/C/D系列的<br />
Aliases for ENSG00000271971 Gene<br />
Quality Score for this RNA gene is 1<br />
Aliases for ENSG00000271971 Gene<br />
CTD-2006H14.2 5<br />
External Ids for ENSG00000271971 Gene<br />
Ensembl: ENSG00000271971<br />
还有，如果你看到HS.开头的基因，它是unigene的ID了，已经不再是symbol啦。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2129.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TPM值就是RPKM的百分比嘛！</title>
		<link>http://www.bio-info-trainee.com/2017.html</link>
		<comments>http://www.bio-info-trainee.com/2017.html#comments</comments>
		<pubDate>Mon, 14 Nov 2016 11:34:12 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[FPKM]]></category>
		<category><![CDATA[RPKM]]></category>
		<category><![CDATA[TPM]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2017</guid>
		<description><![CDATA[很久以前就有人问过这个问题啦，虽然目前主流还是用RPKM/FPKM来形容一个基因 &#8230; <a href="http://www.bio-info-trainee.com/2017.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>很久以前就有人问过这个问题啦，虽然目前主流还是用RPKM/FPKM来形容一个基因的表达量。但是既然大家都说TPM更好，我也来探究一下吧！</p>
<p>我不喜欢看公式，直接说事情，我有一个基因A，它在这个样本的转录组数据中被测序而且mapping到基因组了 5000个的reads，而这个基因A长度是10K，我们总测序文库是50M，所以这个基因A的RPKM值是 5000除以10，再除以50，为10. 就是把基因的reads数量根据基因长度和样本测序文库来normalization 。<span id="more-2017"></span></p>
<p>那么它的TPM值是多少呢？ 这个时候这些信息已经不够了，需要知道该样本其它基因的RPKM值是多少，加上该样本有3个基因，另外两个基因的RPKM值是5和35，那么我们的基因A的RPKM值为10需要换算成TPM值就是<strong><span style="text-decoration: underline;"><span style="color: #ff00ff; text-decoration: underline;"> 1,000,000 *10/(5+10+35)=200,000，</span></span></strong>看起来是不是有点大呀，其实主要是因为我们假设的基因太少了，一般个体里面都有两万多个基因的，总和会大大的增加，这样TPM值跟RPKM值差别不会这么恐怖的。</p>
<p><span style="color: #ff00ff;"><strong>TPM值就是RPKM的百分比！！！</strong></span></p>
<p><span style="color: #ff00ff;"><strong>TPM值就是RPKM的百分比！！！</strong></span></p>
<p><span style="color: #ff00ff;"><strong>TPM值就是RPKM的百分比！！！</strong></span></p>
<p>大家肯定想问，TPM的优点是什么呢？很明显，所有基因的TPM值加起来肯定是1M，因为百分比的总和就是1嘛，与样本无关，各个样本都可以保证TPM库是一样的，这样比较更有意义！！！</p>
<p>我这里没有讲FPKM，大家自己搜索学习吧，没什么意思</p>
<p>最后还是贴上公式吧！</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/12.png"><img class="alignnone size-full wp-image-2018" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/11/12.png" alt="1" width="613" height="587" /></a></p>
<p>&nbsp;</p>
<p>一大波我懒得看的参考资料：</p>
<div><a href="http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/">http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/</a></div>
<div><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/</a></div>
<div><a href="https://www.biostars.org/p/88751/">https://www.biostars.org/p/88751/</a></div>
<div><a href="https://www.biostars.org/p/133488/">https://www.biostars.org/p/133488/</a></div>
<div><a href="https://www.biostars.org/p/115674/">https://www.biostars.org/p/115674/</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2017.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>终于碰到color space的测序数据啦！</title>
		<link>http://www.bio-info-trainee.com/1850.html</link>
		<comments>http://www.bio-info-trainee.com/1850.html#comments</comments>
		<pubDate>Thu, 04 Aug 2016 00:23:08 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[csfasta]]></category>
		<category><![CDATA[csfastq]]></category>
		<category><![CDATA[qual]]></category>
		<category><![CDATA[solid]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1850</guid>
		<description><![CDATA[看了illumina的测序仪市场份额的确很夸张，像我这样在生信数据分析领域身经百 &#8230; <a href="http://www.bio-info-trainee.com/1850.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>看了illumina的测序仪市场份额的确很夸张，像我这样在生信数据分析领域身经百战的老鸟，都是直到今天才碰到color space的测序数据。测序平台是AB 5500xl Genetic Analyzer，就是传说中的solid格式。主要是我在学习一篇关于tp53转录因子结合能力的文章的时候碰到的 ，我查看了下载的数据虽然还是fastq格式，但很诡异，我完全不认识里面的序列。这里总结一下，下面是我的学习过程及思路，有点乱，大家随便看看！</p>
<div>首先：测序仪给的数据应该是<b><span style="color: #ff0000;"> (.csfasta &amp; .qual) </span></b>这两个后缀名的文件</div>
<div>然后，可以用脚本把数据转为csfastq格式， 与普通fastq数据格式是没有区别，但是里面包含的不是序列，是color的编码。</div>
<div>其次，<strong><span style="color: #ff00ff;">color space不允许转为base space数据！！！</span></strong></div>
<div>最后，之所以转为csfastq格式，是为了适应很多软件，fastqc,cutadap，SHRiMP，sequel和BFAST ，bowtie等等</div>
<p><span id="more-1850"></span></p>
<div>
<div>csfastq数据如下，还是四行代表一条read：</div>
<blockquote>
<div>@SRR2967009.1 100_1000_1168_F3</div>
<div>T10011023211201220121202030102221012302121010131001</div>
<div>+</div>
<div>2@@@@&gt;@?@@@@&lt;@@//;@@/@9?@8@=@@@6;<a href="mailto:6@66">6@66</a>;&lt;@<a href="mailto:6@67">6@67</a>?2?;/@</div>
<div>@SRR2967009.2 100_1000_1211_F3</div>
<div>T20132312201120021312220200023110220113100012321011</div>
<div>+</div>
<div>@@@@@@@@@&lt;@@@@@@@@@@@@@@@@@@@@@@?@@@@/?@@@@@@@@&lt;?@</div>
<div>@SRR2967009.3 100_1000_1272_F3</div>
<div>T33222002231020000110132110001032232200332111022002</div>
</blockquote>
<div>起初，我完全蒙圈了，查了资料才勉强了解。</div>
</div>
<div>Generally, in a classic fastq format file, first line is begin with "@", 2nd line is the sequence of reads, 3rd line is a "+" and 4th line is the quality.<br />
However in these fastq files, the sequence of reads are some numbers ("0,1,2,3").</div>
<div>其实这个fastq并不是测序仪的下机数据，测序仪给的数据应该是<b><span style="color: #ff0000;"> (.csfasta &amp; .qual) </span></b>这两个后缀名的文件，一般情况下我们需要把SOLid output files (.csfasta &amp; .qual) into an integrated .csfastq file，转为的csfastq就是fastq格式了，但是跟通常的fastq有略微区别</div>
<div>所以我们的fastq里面的不是序列，而是color的编码，Colors may be encoded either as numbers (<code>0</code>=blue, <code>1</code>=green, <code>2</code>=orange, <code>3</code>=red) or as characters <code>A/C/G/T</code> (<code>A</code>=blue, <code>C</code>=green, <code>G</code>=orange, <code>T</code>=red).</div>
<div>
<pre><code>&gt;1_53_33_F3
T2213120002010301233221223311331
&gt;1_53_70_F3
T2302111203131231130300111123220
...</code></pre>
<p>Here, <code>T</code> is the primer base. <code>bowtie</code> detects and handles primer bases properly (i.e., the primer base and the adjacent color are both trimmed away prior to alignment) as long as the rest of the read is encoded as numbers.</p>
</div>
<div>如果从sra数据库里面下载数据的时候知道是solid的数据，就应该用abi-dump而不是fastq-dump</div>
<div>比如对<a href="http://www.ncbi.nlm.nih.gov/sra?term=SRP066824">http://www.ncbi.nlm.nih.gov/sra?term=SRP066824</a> 来说：</div>
<div>
<div>
<div>首先下载数据并且解压：</div>
<div>for ((i=7009;i&lt;7014;i++)) ;do wget <a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP066/SRP066824/SRR296">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP066/SRP066824/SRR296</a>$i/SRR296$i.sra;done</div>
<div>因为测序平台是AB 5500xl Genetic Analyzer，就是传说中的solid格式，所以不应该用fastq-dump啦，应该用abi-dump才对！</div>
<div>
<div>参考：<a href="http://davetang.org/muse/2012/07/04/from-sra-to-fastq-for-solid-data/">http://davetang.org/muse/2012/07/04/from-sra-to-fastq-for-solid-data/</a></div>
<div>ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/<b><span style="color: #ff0000;">abi-dump</span></b> $id;done</div>
</div>
<div>解压之后是下面这样：</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/0bd9c162d2de40b4ababb49468c08086/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="6BDE7C047E0F4F2E8E8394CAD4692844" data-attr-org-img-file="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/0bd9c162d2de40b4ababb49468c08086/clipboard.png" /><img class="alignnone size-full wp-image-1851" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/11.png" alt="1" width="310" height="112" /></div>
</div>
<div>这样只能转为csfasta格式文件和qual文件，需要下载大名鼎鼎的lh写的一个脚本：wget <a href="http://www.bbmriwiki.nl/svn/bwa_45_patched/solid2fastq.pl">http://www.bbmriwiki.nl/svn/bwa_45_patched/solid2fastq.pl</a> 来转为fastq格式</div>
<p>程序非常好用：perl solid2fastq.pl  SRR2967009_ SRR2967009 即可</p>
</div>
<div>也可以用Python程序来做这个转换，<a href="http://edison.cremag.org/resources/seq-analysis/tools/solid2fastq/">http://edison.cremag.org/resources/seq-analysis/tools/solid2fastq/</a></div>
<div></div>
<div><span style="color: #ff0000;">最后就是输出了fastq格式的 color space的数据 ，但是我测试了，直接用fastq-dump也可以把数据解压成fastq格式的color space的数据，并不需要那么麻烦的， 因为我们不是从测序仪拿数据，而是从SRA数据库里面直接下载。(补充一下，直接用fastq-dump也可以把数据解压成fastq格式跟用abi-dump解压后再转换成csfastq有区别，但是我现在说不清楚区别是什么，建议用abi-dump)</span></div>
<div>
<h3>SOLiD native (CSFASTA/QUAL)</h3>
<p>All SRA data can be output into color space data. The utility ‘<a href="http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&amp;f=abi-dump">abi-dump</a>’ can be used to output CSFASTA and QUAL data files (with appropriate options, fastq-dump can be used to output “CSFASTQ” format).</p>
</div>
<div></div>
<div>SHRiMP，sequel和BFAST 都可以来比对fastq格式的color space的数据，或者直接从<b><span style="color: #ff0000;"> (.csfasta &amp; .qual) 这两个文件开始处理</span></b>，其实bowtie也可以的。</div>
<div>
<div><a href="https://wikis.utexas.edu/display/bioiteam/BFAST">https://wikis.utexas.edu/display/bioiteam/BFAST</a></div>
</div>
<div>比对后的bam文件，就可以走正常的illumina数据分析流程啦！</div>
<div>转为了fastq格式的color space的数据，就可以直接进行fastqc看看质量控制图片，如果质量很差，可以直接用处理<a href="http://cutadapt.readthedocs.io/en/stable/colorspace.html">cutadapt</a>等各种软件进行处理，in a <code>.csfasta</code> and a <code>.qual</code> file (this is the native SOLiD format).</div>
<div>参考：<a href="http://cutadapt.readthedocs.io/en/stable/colorspace.html">http://cutadapt.readthedocs.io/en/stable/colorspace.html</a></div>
<div>fastqc软件直接处理csfastq格式数据结果如下：</div>
<div><img src="file:///C:/Users/Jimmy/AppData/Local/YNote/data/jmzeng1314@163.com/dd0c09fa115c470dad850a36e1df0126/clipboard.png" alt="" data-media-type="image" data-inited="true" /><img class="alignnone size-full wp-image-1852" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/08/2.png" alt="2" width="526" height="229" /></div>
<div>
<div>参考：<a href="http://seqanswers.com/forums/showpost.php?p=59156&amp;postcount=4">http://seqanswers.com/forums/showpost.php?p=59156&amp;postcount=4</a></div>
<div>Sequencer reads have a chance of read error (e.g. spot misidentification), combined with a chance of sequence error (e.g. polymerase misread in the PCR step).</div>
<div>For sequencers that output in base space, both these errors have a similar effect on the base-space mapping.</div>
<div>For sequencers that output in color-space, the read errors result in a somewhat unexpected base-space translation even if the underlying sequence has a perfect match to the reference.</div>
<div><span style="color: #ff0000;"><b>The issues relating to color-space to base-space translation</b></span> were discussed in the thread you linked to, but here's my take on it (dumped from an email I recently sent to someone else):A color-space sequence is an encoding of adjacent dimers such that unchanging bases are encoded with '0', complementary changes are encoded with '3', the colour '1' is used for a non-complementary base change on the same side of the alphabet (AC, CA, GT, or TG), and the colour '2' is used for a non-complementary base change on a different side of the alphabet (AG, GA, CT, or TC). A table of these changes can be found here:<br />
<a href="http://www.ploscompbiol.org/article/slideshow.action?uri=info:doi/10.1371/journal.pcbi.1000386&amp;imageURI=info:doi/10.1371/journal.pcbi.1000386.g002" target="_blank">http://www.ploscompbiol.org/article/...i.1000386.g002</a><br />
This has a few nice properties (e.g. the reverse-complement of a color-space sequence is the same as the reverse of the color-space sequence, a SNP will have two transitions), but many annoying and nasty properties.</div>
<div>The first is that a color-space sequence in itself is meaningless without a base reference (usually the starting base).</div>
</div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1850.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>芯片探针注释基因ID或者symbol，并对每个基因挑选最大表达量探针</title>
		<link>http://www.bio-info-trainee.com/1502.html</link>
		<comments>http://www.bio-info-trainee.com/1502.html#comments</comments>
		<pubDate>Tue, 29 Mar 2016 10:14:06 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[探针]]></category>
		<category><![CDATA[芯片]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1502</guid>
		<description><![CDATA[在R里面实现这个功能其实非常简单，难的是很多packages经常会出现安装问题， &#8230; <a href="http://www.bio-info-trainee.com/1502.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>在R里面实现这个功能其实非常简单，难的是很多packages经常会出现安装问题，更有的人压根不看芯片平台是什么，芯片对应的package是什么，就开始到处发问，自学能力实在是堪忧！</p>
<p>我前面有写目前所有bioconductor支持的芯片平台对应关系：<a title="详细阅读 通过bioconductor包来获取所有的芯片探针与gene的对应关系" href="http://www.bio-info-trainee.com/1399.html" rel="bookmark">通过bioconductor包来获取所有的芯片探针与gene的对应关系</a></p>
<p>但那其实是一个很笨的办法，得到所有的各式各样的探针ID与基因的对应关系，以为它绕路了，正常情况只需要在GEO里面找到芯片对应基因关系即可，没必要下载那么多package的，但是这样做的好处也是很明显的， 对很多初学者来说，如果package能解决的话，就省心很多，比如下面这个转换关系：</p>
<blockquote>
<div>suppressPackageStartupMessages(library(CLL))</div>
<div>## 这个package自带了一个数据，是我们需要用的</div>
<div>data(sCLLex)  ## 这个数据里面有24个样本，分成两组，可以直接拿来测试差异基因分析</div>
<div>library(hgu95av2.db) <strong> ## 一定要搞清楚自己的芯片是什么数据包</strong></div>
<div><a href="http://www.bio-info-trainee.com/1399.html"><strong>## 常见的芯片平台，都是有对应的bioconductor数据包的</strong></a></div>
<div>exprSet=exprs(sCLLex)  ##得到表达数据矩阵，但是矩阵的行名，是探针ID，无法理解，需要转换</div>
<div>##首先你取出所有的探针ID，<span style="color: #ff0000;">#这里可以用三种方法来得到symbol，或者得到entrezID也可以</span></div>
<div>probeset=rownames(exprSet)</div>
<div>Symbol=as.character(as.list(<span style="color: #ff0000;">hgu95av2SYMBOL</span>[probeset]))</div>
<div><span style="color: #ff0000;">#annotate包提供              getSYMBOL( probeset ,"hgu95av2" )</span></div>
<div><span style="color: #ff0000;">#还可以用lookUp函数     lookUp( probeset , "hgu95av2", "SYMBOL")</span></div>
<div><span style="color: #ff0000;">#这些只是技巧而已啦</span></div>
<div>a=cbind.data.frame(Symbol,exprSet)</div>
<div><strong><span style="color: #ff0000;">## 下面这个函数是对每个基因挑选最大表达量探针</span></strong></div>
<div>rmDupID &lt;-function(a=matrix(c(1,1:5,2,2:6,2,3:7),ncol=6)){</div>
<div>  exprSet=a[,-1]</div>
<div>  rowMeans=apply(exprSet,1,function(x) mean(as.numeric(x),na.rm=T))</div>
<div>  a=a[order(rowMeans,decreasing=T),]</div>
<div>  exprSet=a[!duplicated(a[,1]),]</div>
<div>  #exprSet=apply(exprSet,2,as.numeric)</div>
<div>  exprSet=exprSet[!<a href="http://is.na">is.na</a>(exprSet[,1]),]</div>
<div>  rownames(exprSet)=exprSet[,1]</div>
<div>  exprSet=exprSet[,-1]</div>
<div>  return(exprSet)</div>
<div>}</div>
<div>exprSet=rmDupID(a)</div>
</blockquote>
<div><strong>对每个基因挑选最大表达量探针</strong>，只是一种处理方法而已，只是我一般处理芯片是这样做的，并不一定就是最好的！</div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1502.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>基因组各种版本对应关系</title>
		<link>http://www.bio-info-trainee.com/1469.html</link>
		<comments>http://www.bio-info-trainee.com/1469.html#comments</comments>
		<pubDate>Tue, 15 Mar 2016 11:50:00 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[未分类]]></category>
		<category><![CDATA[ENSEMBL]]></category>
		<category><![CDATA[ncbi]]></category>
		<category><![CDATA[UCSC]]></category>
		<category><![CDATA[基因组版本]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1469</guid>
		<description><![CDATA[我是受到了SOAPfuse的启发才想到整理各种基因组版本的对应关系，完整版！！！ &#8230; <a href="http://www.bio-info-trainee.com/1469.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<pre>我是受到了SOAPfuse的启发才想到整理各种基因组版本的对应关系，完整版！！！</pre>
<pre>以后再也不用担心各种基因组版本混乱了，我还特意把所有的下载链接都找到了，可以下载任意版本基因组的基因fasta文件，gtf注释文件等等！！！</pre>
<div>首先是NCBI对应UCSC，对应ENSEMBL数据库：</div>
<div></div>
<div>
<blockquote>
<div>GRCh36 (hg18): ENSEMBL release_52.</div>
<div>GRCh37 (hg19): ENSEMBL release_59/61/64/68/69/75.</div>
<div>GRCh38 (hg38): ENSEMBL  release_76/77/78/80/81/82.</div>
</blockquote>
<div></div>
<div>可以看到ENSEMBL的版本特别复杂！！！很容易搞混！</div>
<div>但是UCSC的版本就简单了，就hg18,19,38, 常用的是hg19，但是我推荐大家都转为hg38</div>
<div>看起来NCBI也是很简单，就GRCh36,37,38，但是里面水也很深！</div>
<div>
<blockquote>
<pre>Feb 13 2014 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/April_14_2003/">April_14_2003</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.33/">BUILD.33</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.34.1/">BUILD.34.1</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.34.2/">BUILD.34.2</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.34.3/">BUILD.34.3</a>
Apr 06 2006 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.35.1/">BUILD.35.1</a>
Aug 03 2009 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.36.1/">BUILD.36.1</a>
Aug 03 2009 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.36.2/">BUILD.36.2</a>
Sep 04 2012 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.36.3/">BUILD.36.3</a>
Jun 30 2011 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.1/">BUILD.37.1</a>
Sep 07 2011 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.2/">BUILD.37.2</a>
Dec 12 2012 00:00    Directory <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.3/">BUILD.37.3</a></pre>
</blockquote>
</div>
<div>可以看到，有37.1,   37.2，  37.3 等等，不过这种版本一般指的是注释在更新，基因组序列一般不会更新！！！</div>
<div>反正你记住hg19基因组大小是3G，压缩后八九百兆即可！！！</div>
<div></div>
<div>如果要下载GTF注释文件，基因组版本尤为重要！！！</div>
<div></div>
<div>对NCBI：<span style="font-family: Arial,Helvetica,sans-serif;"><a href="ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/">ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/          ##最新版（hg38）</a></span></div>
<div><span style="font-family: Arial,Helvetica,sans-serif;"><a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/">ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/    ## 其它版本</a></span></div>
<div></div>
<div>对于ensembl：</div>
<div><a href="ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz" rel="nofollow">ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz</a></div>
<div>变幻中间的release就可以拿到所有版本信息：<a href="ftp://ftp.ensembl.org/pub/">ftp://ftp.ensembl.org/pub/</a></div>
<div>对于UCSC，那就有点麻烦了：</div>
<div>
<div>需要选择一系列参数：</div>
<div><a href="http://genome.ucsc.edu/cgi-bin/hgTables">http://genome.ucsc.edu/cgi-bin/hgTables</a></div>
<div></div>
<blockquote>
<div>1. Navigate to <a href="http://genome.ucsc.edu/cgi-bin/hgTables" target="_blank" rel="nofollow">http://genome.ucsc.edu/cgi-bin/hgTables</a></div>
<div></div>
<div>2. Select the following options:<br />
clade: Mammal<br />
genome: Human<br />
assembly: Feb. 2009 (GRCh37/hg19)<br />
group: Genes and Gene Predictions<br />
track: UCSC Genes<br />
table: knownGene<br />
region: Select "genome" for the entire genome.<br />
output format: GTF - gene transfer format<br />
output file: enter a file name to save your results to a file, or leave blank to display results in the browser</div>
<div></div>
<div>3. Click 'get output'.</div>
</blockquote>
</div>
<div> 现在重点来了，搞清楚版本关系了，就要下载呀！</div>
<div>UCSC里面下载非常方便，只需要根据基因组简称来拼接url即可：</div>
<div>
<blockquote>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz</a></div>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz</a></div>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz</a></div>
<div><a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz</a></div>
</blockquote>
<div>或者用shell脚本指定下载的染色体号：</div>
<blockquote>
<div>for i in $(seq 1 22) X Y M;<br />
do echo $i;<br />
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr${i}.fa.gz;</div>
<div>## 这里也可以用NCBI的：ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/MGSCv3_Release3/Assembled_Chromosomes/chr前缀<br />
done<br />
gunzip *.gz<br />
for i in $(seq 1 22) X Y M;<br />
do cat chr${i}.fa &gt;&gt; hg19.fasta;<br />
done<br />
rm -fr chr*.fasta</div>
</blockquote>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1469.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用R获取芯片探针与基因的对应关系三部曲-bioconductor</title>
		<link>http://www.bio-info-trainee.com/1399.html</link>
		<comments>http://www.bio-info-trainee.com/1399.html#comments</comments>
		<pubDate>Mon, 15 Feb 2016 15:41:55 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[探针]]></category>
		<category><![CDATA[生物芯片]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1399</guid>
		<description><![CDATA[现有的基因芯片种类不要太多了！ 但是重要而且常用的芯片并不多！ 一般分析芯片数据 &#8230; <a href="http://www.bio-info-trainee.com/1399.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>现有的基因芯片种类不要太多了！</p>
<div>但是重要而且常用的芯片并不多！</div>
<div>一般分析芯片数据都需要把探针的ID切换成基因的ID，我一般喜欢用基因的entrez ID。</div>
<div><strong><span style="color: #ff0000;">一般有三种方法可以得到芯片探针与gene的对应关系。</span></strong></div>
<div><strong><span style="color: #ff0000;">金标准当然是去基因芯片的厂商的官网直接去下载啦！！！</span></strong></div>
<div><strong><span style="color: #ff0000;">一种是直接用bioconductor的包</span></strong></p>
<div><strong><span style="color: #ff0000;">一种是从NCBI里面下载文件来解析好！</span></strong></div>
<div>首先，我们说官网，肯定可以找到，不然这种芯片出来就没有意义了！</div>
<div>然后，我们看看NCBI下载的，会比较大</div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6947">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6947</a></div>
<div>这两种方法都比较麻烦，需要一个个的来！</div>
<div>所以我接下来要讲的是用R的bioconductor包来批量得到芯片探针与gene的对应关系！</div>
<div>一般重要的芯片在R的bioconductor里面都是有包的，用一个R包可以批量获取有注释信息的芯片平台，我选取了常见的物种，如下：</div>
<div></div>
<div>
<blockquote>
<div>        gpl           organism                  bioc_package</div>
<div>1     GPL32       Mus musculus                        mgu74a</div>
<div>2     GPL33       Mus musculus                        mgu74b</div>
<div>3     GPL34       Mus musculus                        mgu74c</div>
<div>6     GPL74       Homo sapiens                        hcg110</div>
<div>7     GPL75       Mus musculus                     mu11ksuba</div>
<div>8     GPL76       Mus musculus                     mu11ksubb</div>
<div>9     GPL77       Mus musculus                     mu19ksuba</div>
<div>10    GPL78       Mus musculus                     mu19ksubb</div>
<div>11    GPL79       Mus musculus                     mu19ksubc</div>
<div>12    GPL80       Homo sapiens                        hu6800</div>
<div>13    GPL81       Mus musculus                      mgu74av2</div>
<div>14    GPL82       Mus musculus                      mgu74bv2</div>
<div>15    GPL83       Mus musculus                      mgu74cv2</div>
<div>16    GPL85  Rattus norvegicus                        rgu34a</div>
<div>17    GPL86  Rattus norvegicus                        rgu34b</div>
<div>18    GPL87  Rattus norvegicus                        rgu34c</div>
<div>19    GPL88  Rattus norvegicus                         rnu34</div>
<div>20    GPL89  Rattus norvegicus                         rtu34</div>
<div>22    GPL91       Homo sapiens                      hgu95av2</div>
<div>23    GPL92       Homo sapiens                        hgu95b</div>
<div>24    GPL93       Homo sapiens                        hgu95c</div>
<div>25    GPL94       Homo sapiens                        hgu95d</div>
<div>26    GPL95       Homo sapiens                        hgu95e</div>
<div>27    GPL96       Homo sapiens                       hgu133a</div>
<div>28    GPL97       Homo sapiens                       hgu133b</div>
<div>29    GPL98       Homo sapiens                     hu35ksuba</div>
<div>30    GPL99       Homo sapiens                     hu35ksubb</div>
<div>31   GPL100       Homo sapiens                     hu35ksubc</div>
<div>32   GPL101       Homo sapiens                     hu35ksubd</div>
<div>36   GPL201       Homo sapiens                       hgfocus</div>
<div>37   GPL339       Mus musculus                       moe430a</div>
<div>38   GPL340       Mus musculus                     mouse4302</div>
<div>39   GPL341  Rattus norvegicus                       rae230a</div>
<div>40   GPL342  Rattus norvegicus                       rae230b</div>
<div>41   GPL570       Homo sapiens                   hgu133plus2</div>
<div>42   GPL571       Homo sapiens                      hgu133a2</div>
<div>43   GPL886       Homo sapiens                     hgug4111a</div>
<div>44   GPL887       Homo sapiens                     hgug4110b</div>
<div>45  GPL1261       Mus musculus                    mouse430a2</div>
<div>49  GPL1352       Homo sapiens                       u133x3p</div>
<div>50  GPL1355  Rattus norvegicus                       rat2302</div>
<div>51  GPL1708       Homo sapiens                     hgug4112a</div>
<div>54  GPL2891       Homo sapiens                       h20kcod</div>
<div>55  GPL2898  Rattus norvegicus                     adme16cod</div>
<div>60  GPL3921       Homo sapiens                     hthgu133a</div>
<div>63  GPL4191       Homo sapiens                       h10kcod</div>
<div>64  GPL5689       Homo sapiens                     hgug4100a</div>
<div>65  GPL6097       Homo sapiens               illuminaHumanv1</div>
<div>66  GPL6102       Homo sapiens               illuminaHumanv2</div>
<div>67  GPL6244       Homo sapiens   hugene10sttranscriptcluster</div>
<div>68  GPL6947       Homo sapiens               illuminaHumanv3</div>
<div>69  GPL8300       Homo sapiens                      hgu95av2</div>
<div>70  GPL8490       Homo sapiens   IlluminaHumanMethylation27k</div>
<div>71 GPL10558       Homo sapiens               illuminaHumanv4</div>
<div>72 GPL11532       Homo sapiens   hugene11sttranscriptcluster</div>
<div>73 GPL13497       Homo sapiens         HsAgilentDesign026652</div>
<div>74 GPL13534       Homo sapiens  IlluminaHumanMethylation450k</div>
<div>75 GPL13667       Homo sapiens                        hgu219</div>
<div>76 GPL15380       Homo sapiens      GGHumanMethCancerPanelv1</div>
<div>77 GPL15396       Homo sapiens                     hthgu133b</div>
<div>78 GPL17897       Homo sapiens                     hthgu133a</div>
</blockquote>
</div>
<div>这些包首先需要都下载</div>
<div>
<blockquote>
<div>gpl_info=read.csv("GPL_info.csv",stringsAsFactors = F)</div>
<div>### first download all of the annotation packages from bioconductor</div>
<div>for (i in 1:nrow(gpl_info)){</div>
<div>  print(i)</div>
<div>  platform=gpl_info[i,4]</div>
<div>  platform=gsub('^ ',"",platform) ##主要是因为我处理包的字符串前面有空格</div>
<div>  #platformDB='hgu95av2.db'</div>
<div>  platformDB=paste(platform,".db",sep="")</div>
<div>  if( platformDB  %in% rownames(installed.packages()) == FALSE) {</div>
<div>    BiocInstaller::biocLite(platformDB)</div>
<div>    #source("<a href="http://bioconductor.org/biocLite.R">http://bioconductor.org/biocLite.R</a>");</div>
<div>    #biocLite(platformDB )</div>
<div>  }</div>
<div>}</div>
</blockquote>
</div>
<blockquote>
<div>下载完了所有的包， 就可以进行批量导出芯片探针与gene的对应关系！</div>
</blockquote>
<div>
<blockquote>
<div>for (i in 1:nrow(gpl_info)){</div>
<div>  print(i)</div>
<div>  platform=gpl_info[i,4]</div>
<div>  platform=gsub('^ ',"",platform)</div>
<div>  #platformDB='hgu95av2.db'</div>
<div>  platformDB=paste(platform,".db",sep="")</div>
<div></div>
<div>  if( platformDB  %in% rownames(installed.packages()) != FALSE) {</div>
<div>    library(platformDB,character.only = T)</div>
<div>    #tmp=paste('head(mappedkeys(',platform,'ENTREZID))',sep='')</div>
<div>    #eval(parse(text = tmp))</div>
<div>###重点在这里，把字符串当做命令运行</div>
<div>    all_probe=eval(parse(text = paste('mappedkeys(',platform,'ENTREZID)',sep='')))</div>
<div>    EGID &lt;- as.numeric(lookUp(all_probe, platformDB, "ENTREZID"))</div>
<div>##自己把内容写出来即可</div>
<div>  }</div>
<div>}</div>
</blockquote>
</div>
<div>参考：<a href="http://blog.sina.com.cn/s/blog_62b37bfe0101jbuq.html">http://blog.sina.com.cn/s/blog_62b37bfe0101jbuq.html</a></div>
<div></div>
</div>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1399.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>拷贝数变异检测芯片介绍</title>
		<link>http://www.bio-info-trainee.com/1295.html</link>
		<comments>http://www.bio-info-trainee.com/1295.html#comments</comments>
		<pubDate>Wed, 06 Jan 2016 01:00:08 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>
		<category><![CDATA[cnv]]></category>
		<category><![CDATA[snp]]></category>
		<category><![CDATA[拷贝数]]></category>
		<category><![CDATA[芯片]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1295</guid>
		<description><![CDATA[这里的拷贝数变异检测芯片指的是Affymetrix Genome-Wide Hu &#8230; <a href="http://www.bio-info-trainee.com/1295.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>这里的拷贝数变异检测芯片指的是Affymetrix Genome-Wide Human SNP Array 6.0</p>
<div>cel数据，需要处理成segment及genotype数据</div>
</div>
<div>这个芯片在TCGA计划里面用的非常多，是标配了。大家只要记住，这是一个跟拷贝数变异检测相关的芯片，而且还可以测一些genotype <span class="Apple-converted-space"> </span></div>
<div>Affymetrix Genome-Wide Human SNP Array 6.0是唯一可以真正将CNP(拷贝数多态性)转化成高分辨率的参考图谱的平台。主要应用领域包括全基因组SNP分型、全基因组CNV分型、全基因组关联 分析、全基因组连锁分析。除了进行基因分型外，还为拷贝数研究和LOH研究提供帮助，从而能够进行：UPD检测、亲子鉴定、异常的亲代起源分析（针对 UPD和缺失）、纯合性分析、血缘关系鉴定。</div>
<div>参考：<a href="http://www.affymetrix.com/support/technical/byproduct.affx?product=genomewidesnp_6">http://www.affymetrix.com/support/technical/byproduct.affx?product=genomewidesnp_6</a></div>
<div></div>
<div>SNP Array 6.0是昂飞公司继Mapping10k、100k、500k和SNP5.0芯片后推出的新一代SNP芯片。在一张芯片上可以分析一个样本<b>906,600 个SNP的基因型</b>, 大约有482，000个SNP来自于前代产品500K和SNP5.0芯片。剩下424，000个SNP包括了来源于国际HapMap计划中的标签 SNP，X，Y染色体和线粒体上更具代表性的SNP,以及来自于重组热点区域和500K芯片设计完成后新加入dbSNP数据库的SNP。<b>该芯片同时含 946,000个非多态性CNV探针</b>，用于检测拷贝数变异，其中202,000个用于检测5677个已知拷贝数变异区域的探针，这些区域来源于多伦多基因 组变异体数据库。该数据库中每隔3,182个非重叠片段区域分别用61个探针来检测。除了检测这些已知的拷贝数多态区域，还有超过744,000个探针平 均分配到整个基因组上，用来发现未知的拷贝数变异区域。SNP和CNV两种探针高密度且均匀地分布在整个基因组<b>，作为拷贝数变异和杂合性缺失(LOH)检 测的工具来发现微小的染色体增加和缺失</b>。为广大生命科学研究者提高发现复杂疾病相关基因的可能提供了强有力的工具。<br />
通过与哈佛大学合办的Broad研究所合作，SNP6.0芯片在数据准确性和一致性方面达到了新的高度。相应推出的Genotyping Console用来处理SNP6.0芯片数据和全基因组遗传分析及质量控制。</div>
<div>
<p><strong>产品特点：</strong></p>
<p>1.涵盖超过1,800,000个遗传变异标志物：包括超过<b>906,600个SNP和超过946,000个用于检测拷贝数变化（CNV，Copy Number Variation）</b>的探针；</p>
<p>2.SNP和CNV两种探针高密度且均匀地分布在整个基因组，不仅可以用于SNP基因精确分型，还可用于拷贝数变异CNV的研究；</p>
<p>3.744,000个探针平均分配到整个基因组上，用来发现未知的拷贝数变异区域；</p>
<p>4.可用于Copy-neutral LOH/UPD检测，亲子鉴定，纯合性分析、血缘关系鉴定、遗传病或其它疾病的研究。</p>
<p>参考：<a href="http://www.biomart.cn/specials/cnv2014/article/84169">http://www.biomart.cn/specials/cnv2014/article/84169</a></div>
<div>在NCBI的GEO数据库里面可以查到这个芯片，已经有一万多个样本数据啦!</div>
<div>图中第一个是CCLE计划的近千个样本，可能是定制化了的snp6.0芯片吧</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard.png"><img class="alignnone size-full wp-image-1296" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard.png" alt="clipboard" width="1028" height="343" /></a></div>
<div>使用这个芯片数据来发文章的非常多，见列表：<a href="http://media.affymetrix.com/support/technical/other/snp6_array_publications.pdf">http://media.affymetrix.com/support/technical/other/snp6_array_publications.pdf</a></div>
<div>还有一篇2010-nature文章讲了如何用picnic来研究cnv，<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3145113/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3145113/</a></div>
<div>也有一篇2010年的文章提出了新的软件来分析这个芯片cnv数据<a href="http://bioinformatics.oxfordjournals.org/content/26/11/1395.long">http://bioinformatics.oxfordjournals.org/content/26/11/1395.long</a></div>
<div>实现同样功能的软件，非常之多，还有一个R的bioconductor系列的包</div>
<div><a href="http://www.bioconductor.org/help/search/index.html?q=cnv/">http://www.bioconductor.org/help/search/index.html?q=cnv/</a></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard2.png"><img class="alignnone size-full wp-image-1297" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/clipboard2.png" alt="clipboard2" width="710" height="602" /></a></div>
<div>随便进去都可以找到很多raw data，可以自己进行分析的！</div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/browse/?view=samples&amp;platform=6801">http://www.ncbi.nlm.nih.gov/geo/browse/?view=samples&amp;platform=6801</a></div>
<div>比如：<a href="ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1949nnn/GSM1949207/suppl/GSM1949207_SB_CID0102B_071708.CEL.gz">ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1949nnn/GSM1949207/suppl/GSM1949207%5FSB%5FCID0102B%5F071708%2ECEL%2Egz</a></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1295.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>liftover基因组版本直接的coordinate转换</title>
		<link>http://www.bio-info-trainee.com/990.html</link>
		<comments>http://www.bio-info-trainee.com/990.html#comments</comments>
		<pubDate>Mon, 07 Sep 2015 02:27:39 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据格式]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=990</guid>
		<description><![CDATA[下载地址:http://hgdownload.cse.ucsc.edu/admi &#8230; <a href="http://www.bio-info-trainee.com/990.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>下载地址:http://hgdownload.cse.<em>ucsc</em>.edu/admin/exe/</div>
<div>我一般是使用linux版本的：wget <a href="http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver">http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver</a></div>
<div>使用方法:【从hg38转到hg19】</div>
<div>因为主流的基因组版本还是hg19，但是时代在进步，已经有很多信息都是以hg38的形式公布出来的了。</div>
<div>比如，我下载了pfam.df这个protein domain注释文件，对人的hg38基因组每个坐标都做了domain注释，数据形式如下：</div>
<div>查看文件内容head pfam.hg38.df ，如下：</div>
<div>PFAMID chr start end strand</div>
<div>Helicase_C_2 chr1 12190 12689 +</div>
<div>7tm_4 chr1 69157 69220 +</div>
<div>7TM_GPCR_Srsx chr1 69184 69817 +</div>
<div>7tm_1 chr1 69190 69931 +</div>
<div>7tm_4 chr1 69490 69910 +</div>
<div>7tm_1 chr1 450816 451557 -</div>
<div>7tm_4 chr1 450837 451263 -</div>
<div>EPV_E5 chr1 450924 450936 -</div>
<div>7TM_GPCR_Srsx chr1 450927 451572 -</div>
<div>我想把domain的起始终止坐标转换成hg19的，就必须要借助UCSC的liftover这个工具啦</div>
<div>这个工具需要一个坐标注释文件 <a href="http://hgdownload-test.cse.ucsc.edu/goldenPath/hg38/liftOver/">http://hgdownload-test.cse.ucsc.edu/goldenPath/hg38/liftOver/</a></div>
<div>我这里需要下载的是<a href="http://hgdownload-test.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz">http://hgdownload-test.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz</a></div>
<div>而且它只能对bed等符合要求的格式进行转换</div>
<div><a href="http://www.ensembl.org/info/website/upload/bed.html">http://www.ensembl.org/info/website/upload/bed.html</a></div>
<div>示例如下：</div>
<div>
<pre>chr7  127471196  127472363  Pos1  0  +  127471196  127472363  255,0,0
chr7  127472363  127473530  Pos2  0  +  127472363  127473530  255,0,0</pre>
</div>
<div>很简单的，把自己的文件随便凑几列信息，做成这个9列的格式即可</div>
<div>cat pfam.hg38.df |sed 's/\r//g' |awk '{print $2,$3,$4,$1,0,$5,$3,$4,"255,0,0"}'  &gt;pfam.hg38.bed</div>
<div>这样就有了足够的文件可以进行坐标转换啦，转换的命令非常简单！</div>
<div>chmod 777 liftOver</div>
<div> ./liftOver pfam.hg38.bed hg38ToHg19.over.chain pfam.hg19.bed unmap</div>
<div>然后运行成功了会有 提示，报错一般是你的格式不符合标准bed格式，自己删掉注释行等等不符合的信息即可</div>
<div>Reading liftover chains</div>
<div>Mapping coordinates</div>
<div>转换后，稍微检查一下就可以看到坐标的确发生了变化，当然，我们只需要看前面几列信息即可</div>
<div>grep -w p53 *bed</div>
<div>pfam.hg19.bed:chr11 44956439 44959858 p53-inducible11 0 - 44956439 44959858 255,0,0</div>
<div>pfam.hg19.bed:chr11 44956439 44959767 p53-inducible11 0 - 44956439 44959767 255,0,0</div>
<div>pfam.hg19.bed:chr2 669635 675557 p53-inducible11 0 - 669635 675557 255,0,0</div>
<div>pfam.hg19.bed:chr22 35660826 35660982 p53-inducible11 0 + 35660826 35660982 255,0,0</div>
<div>仔细看看坐标是不是变化啦！</div>
<div>pfam.hg38.bed:chr11 44934888 44938307 p53-inducible11 0 - 44934888 44938307 255,0,0</div>
<div>pfam.hg38.bed:chr11 44934888 44938216 p53-inducible11 0 - 44934888 44938216 255,0,0</div>
<div>pfam.hg38.bed:chr2 669635 675557 p53-inducible11 0 - 669635 675557 255,0,0</div>
<div>pfam.hg38.bed:chr22 35264833 35264989 p53-inducible11 0 + 35264833 35264989 255,0,0</div>
<div>其实R里面的bioconductor系列包也可以进行坐标转换 <a href="http://www.bioconductor.org/help/workflows/liftOver/">http://www.bioconductor.org/help/workflows/liftOver/</a></div>
<div>这个可以直接接着下载pfam.df数据库来做下去。更方便一点。</div>
<div>
<div>
<pre>我的数据如下，需要自己创建成一个GRanges对象</pre>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/09/1.png"><img class="alignnone size-full wp-image-991" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/09/1.png" alt="1" width="442" height="224" /></a></p>
</div>
<div>
<pre>library(GenomicRanges)</pre>
</div>
<div>
<pre>pfam.hg38 &lt;- GRanges(seqnames=Rle(a[,2]),
               ranges=IRanges(a[,3], a[,4]),
               strand=a[,5])</pre>
</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/09/2.png"><img class="alignnone size-full wp-image-992" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/09/2.png" alt="2" width="498" height="375" /></a></div>
<div>这样就OK拉，虽然这只是一个很简陋的GRanges对象，但是这个GRanges对象可以通过R的liftover方法来转换坐标啦。</div>
</div>
<div>
<pre>library(rtracklayer)
ch = import.chain("hg38ToHg19.over.chain")</pre>
<p>pfam.hg19 = liftOver(pfam.hg38, ch)</p>
</div>
<div>pfam.hg19 =unlist(pfam.hg19)</div>
<div></div>
<div>再把这个转换好的pfam.hg19 写出即可</div>
<div>参考：<a href="http://www.zilhua.com/906.html">http://www.zilhua.com/906.html</a></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/990.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
