<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; GWAS</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/gwas/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>吐血推荐snpedia数据库，非常丰富的snp信息记录</title>
		<link>http://www.bio-info-trainee.com/2100.html</link>
		<comments>http://www.bio-info-trainee.com/2100.html#comments</comments>
		<pubDate>Thu, 01 Dec 2016 10:09:44 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础数据库]]></category>
		<category><![CDATA[dbsnp]]></category>
		<category><![CDATA[GWAS]]></category>
		<category><![CDATA[snpedia]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2100</guid>
		<description><![CDATA[正好，我拿到了自己的全基因组测序数据，而前些天看到朋友圈推送的文章提到有研究表明 &#8230; <a href="http://www.bio-info-trainee.com/2100.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>正好，我拿到了自己的全基因组测序数据，而前些天看到朋友圈推送的文章提到<strong><span style="color: #ff0000;">有研究表明STAT4上的rs7574865和HLA-DQ的 rs9275319是国人群中乙型肝炎病毒（HBV）相关肝细胞癌（HCC）遗传易感基因，</span></strong>我就想顺便看看自己在这两个位点的变异情况。一般的流程是先找完变异位点，然后用vep/snpEFF对变异位点进行注释，然后看看有没有这两个位点。但我仅仅是想查看这两个位点，所以我会根据它的rsID来找到它的基因组坐标，再直接call这个位置的变异情况。以前我都是用dnSNP来查看rsID的基因组坐标的，</div>
<blockquote>
<div>mkdir -p ~/annotation/variation/human/dbSNP</div>
<div>cd ~/annotation/variation/human/dbSNP</div>
<div>## https://www.ncbi.nlm.nih.gov/projects/SNP/</div>
<div>## ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/</div>
<div>## ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/</div>
<div>nohup wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_20160601.vcf.gz &amp;</div>
<div>wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/All_20160601.vcf.gz.tbi</div>
</blockquote>
<p><span id="more-2100"></span></p>
<div>比如我会用上面的代码来下载All_20160601.vcf.gz 这个文件，去搜索想要的dbsnp的坐标，当然，这个文件太大了，如果只是搜索一两个位点，没必要那么费工夫，它有网页数据库的，直接修改url即可：</div>
<div><a href="https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=7574865">https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=7574865</a></div>
<div><a href="https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs9275319">https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=rs9275319</a></div>
<div>很轻松得到该变异位点所有的信息，但是这次我谷歌这个rsID的时候，发现dbSNP不是排在首位的，而是了一个数据库，snpedia，简单浏览了一下，发现的确做得很赞，值的强烈推荐。</div>
<div><a href="https://www.snpedia.com/index.php/Rs7574865">https://www.snpedia.com/index.php/Rs7574865</a></div>
<div><a href="https://www.snpedia.com/index.php/Rs9275319">https://www.snpedia.com/index.php/Rs9275319</a></div>
<div>也是同样修改url就可以获取到对应的信息。</div>
<div></div>
<div>但是它强大的地方在，搜集了非常多的其它数据库的链接：</div>
<div>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>Reference</td>
<td><a href="https://www.snpedia.com/index.php/GRCh38" target="_blank">GRCh38 38.1/141</a></td>
</tr>
</tbody>
</table>
</div>
<div>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>Chromosome</td>
<td>2</td>
</tr>
</tbody>
</table>
</div>
<div>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>Position</td>
<td>191099907</td>
</tr>
</tbody>
</table>
</div>
<div>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>Gene</td>
<td><a href="https://www.snpedia.com/index.php/STAT4" target="_blank">STAT4</a></td>
</tr>
</tbody>
</table>
</div>
<div>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>is a</td>
<td><a href="https://www.snpedia.com/index.php/Snp" target="_blank">snp</a></td>
</tr>
<tr>
<td>is</td>
<td><a href="https://www.snpedia.com/index.php/Special:WhatLinksHere/Rs7574865" target="_blank">mentioned by</a></td>
</tr>
<tr>
<td>dbSNP</td>
<td><a href="http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>ebi</td>
<td><a href="https://www.ebi.ac.uk/gwas/search?query=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>HLI</td>
<td><a href="https://search.hli.io/?q=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>Exac</td>
<td><a href="http://exac.broadinstitute.org/awesome?query=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>Varsome</td>
<td><a href="https://varsome.com/variant/hg19/rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>Map</td>
<td><a href="http://popgen.uchicago.edu/ggv/?search=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>PheGenI</td>
<td><a href="http://www.ncbi.nlm.nih.gov/gap/PheGenI?tab=2&amp;rs=7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td><a href="https://www.snpedia.com/index.php/Help_(hapmap)" target="_blank">hapmap</a></td>
<td><a href="http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse/hapmap27_B36/?name=SNP%3Ars7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td><a href="https://www.snpedia.com/index.php/1000_genomes" target="_blank">1000 genomes</a></td>
<td><a href="http://browser.1000genomes.org/Homo_sapiens/Variation/Population?v=rs7574865;vdb=variation" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>hgdp</td>
<td><a href="http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/?name=SNP%3Ars7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>ensembl</td>
<td><a href="http://www.ensembl.org/Homo_sapiens/snpview?source=dbSNP;snp=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>gopubmed</td>
<td><a href="http://www.gopubmed.org/search?q=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>geneview</td>
<td><a href="http://bc3.informatik.hu-berlin.de/search?gv_search_query=RS:7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>scholar</td>
<td><a href="http://scholar.google.com/scholar?q=rs7574865&amp;as_subj=bio" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>google</td>
<td><a href="http://www.google.com/search?hl=en&amp;q=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>pharmgkb</td>
<td><a href="http://www.pharmgkb.org/rsid/rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>gwascentral</td>
<td><a href="http://www.gwascentral.org/marker/dbSNP:rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>openSNP</td>
<td><a href="https://opensnp.org/snps/rs7574865#users" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td><a href="https://www.snpedia.com/index.php/23andMe_(help)" target="_blank">23andMe</a></td>
<td><a href="https://www.23andme.com/you/explorer/snp/?snp_name=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>23andMe all</td>
<td><a href="https://www.23andme.com/you/search/?isearch=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>SNP Nexus</td>
<td></td>
</tr>
<tr>
<td>SNPshot</td>
<td><a href="http://bioai4core.fulton.asu.edu/snpshot/FactSheet?id=rs7574865&amp;type=RSNO" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>SNPdbe</td>
<td><a href="http://www.rostlab.org/services/snpdbe/dosearch.php?id=mutation&amp;val=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>MSV3d</td>
<td><a href="http://decrypthon.igbmc.fr/msv3d/cgi-bin/humsavar?rsid=rs7574865" target="_blank">rs7574865</a></td>
</tr>
<tr>
<td>GWAS Ctlg</td>
<td><a href="https://www.ebi.ac.uk/gwas/search?query=rs7574865" target="_blank">rs7574865</a></td>
</tr>
</tbody>
</table>
</div>
<div>很容易看出这些链接都是有规律的，就是我最喜欢的修改url啦，其实是利用网络传输的post/get请求来创建网页~</div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2100.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GWAS研究现状及资源下载</title>
		<link>http://www.bio-info-trainee.com/719.html</link>
		<comments>http://www.bio-info-trainee.com/719.html#comments</comments>
		<pubDate>Fri, 08 May 2015 13:12:58 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[生信基础]]></category>
		<category><![CDATA[GWAS]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=719</guid>
		<description><![CDATA[ GWAS研究是非常火的，NHGIR还专门为它开辟了专栏来介绍，下面这个图片也是 &#8230; <a href="http://www.bio-info-trainee.com/719.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h3> GWAS研究是非常火的，NHGIR还专门为它开辟了专栏来介绍，下面这个图片也是来自于NHGIR组织，是GWAS近年来发表文章的状况。</h3>
<h3><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/图片1.png"><img class="alignnone  wp-image-720" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/05/图片1.png" alt="图片1" width="562" height="422" /></a></h3>
<p>可以在该文章上面下载这个所有的数据</p>
<p>wget <a href="http://www.genome.gov/admin/gwascatalog.txt">http://www.genome.gov/admin/gwascatalog.txt</a></p>
<p>截至目前为止。2015年5月8日21:08:34</p>
<p>这个文档有19603行的数据，但是只有2113篇pubmed文献，共涉及到七千多个基因</p>
<p>有293种杂志都发过GWAS的文章，总共有2113篇文献，发表关联分析突变位点最多的是这篇文献23251661 在PLoS One杂志上面，共 949个rs突变</p>
<p>杂志排序<br />
cut -f 2,5 gwascatalog.txt |perl -alne '{$hash{$_}++}END{print "$_" foreach sort {$hash{$a} &lt;=&gt; $hash{$b}} keys %hash}' |cut -f 2 |perl -alne '{$hash{$_}++}END{print "$_\t$hash{$_}" foreach sort {$hash{$a} &lt;=&gt; $hash{$b}} keys %hash}'<br />
Hum Genet 41<br />
Am J Hum Genet 62<br />
Mol Psychiatry 64<br />
PLoS One 132<br />
PLoS Genet 145<br />
Hum Mol Genet 168<br />
Nat Genet 397<br />
文章的rs突变点排序<br />
cut -f 2,5 gwascatalog.txt |perl -alne '{$hash{$_}++}END{print "$_ $hash{$_}" foreach sort {$hash{$a} &lt;=&gt; $hash{$b}} keys %hash}'<br />
24324551 PLoS One 241<br />
24097068 Nat Genet 245<br />
24816252 Nat Genet 299<br />
23382691 PLoS Genet 699<br />
23251661 PLoS One 949</p>
<p>数据打开如下：</p>
<p>我取了表头和第一行数据，然后把它转置了，这样方便查看</p>
<table>
<tbody>
<tr>
<td width="225">Date Added to Catalog</td>
<td width="564">10/22/2014</td>
</tr>
<tr>
<td width="225">PUBMEDID</td>
<td width="564">24528284</td>
</tr>
<tr>
<td width="225">First Author</td>
<td width="564">Ji Y</td>
</tr>
<tr>
<td width="225">Date</td>
<td width="564">08/01/2014</td>
</tr>
<tr>
<td width="225">Journal</td>
<td width="564">Br J Clin Pharmacol</td>
</tr>
<tr>
<td width="225">Link</td>
<td width="564">http://www.ncbi.nlm.nih.gov/pubmed/24528284</td>
</tr>
<tr>
<td width="225">Study</td>
<td width="564">Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.</td>
</tr>
<tr>
<td width="225">Disease/Trait</td>
<td width="564">Response to serotonin reuptake inhibitors in major depressive disorder (plasma drug and metabolite levels)</td>
</tr>
<tr>
<td width="225">Initial Sample Description</td>
<td width="564">300 European ancestry Escitalpram treated individuals, 130 European ancestry Citalopram treated individuals</td>
</tr>
<tr>
<td width="225">Replication Sample Description</td>
<td width="564">NA</td>
</tr>
<tr>
<td width="225">Region</td>
<td width="564">17q25.3</td>
</tr>
<tr>
<td width="225">Chr_id</td>
<td width="564">17</td>
</tr>
<tr>
<td width="225">Chr_pos</td>
<td width="564">79831041</td>
</tr>
<tr>
<td width="225">Reported Gene(s)</td>
<td width="564">CBX4</td>
</tr>
<tr>
<td width="225">Mapped_gene</td>
<td width="564">CBX8 - CBX4</td>
</tr>
<tr>
<td width="225">Upstream_gene_id</td>
<td width="564">57332</td>
</tr>
<tr>
<td width="225">Downstream_gene_id</td>
<td width="564">8535</td>
</tr>
<tr>
<td width="225">Snp_gene_ids</td>
<td width="564"></td>
</tr>
<tr>
<td width="225">Upstream_gene_distance</td>
<td width="564">33.93</td>
</tr>
<tr>
<td width="225">Downstream_gene_distance</td>
<td width="564">2.12</td>
</tr>
<tr>
<td width="225">Strongest SNP-Risk Allele</td>
<td width="564">rs9747992-?</td>
</tr>
<tr>
<td width="225">SNPs</td>
<td width="564">rs9747992</td>
</tr>
<tr>
<td width="225">Merged</td>
<td width="564">0</td>
</tr>
<tr>
<td width="225">Snp_id_current</td>
<td width="564">9747992</td>
</tr>
<tr>
<td width="225">Context</td>
<td width="564">Intergenic</td>
</tr>
<tr>
<td width="225">Intergenic</td>
<td width="564">1</td>
</tr>
<tr>
<td width="225">Risk Allele Frequency</td>
<td width="564">0.086</td>
</tr>
<tr>
<td width="225">p-Value</td>
<td width="564">2.00E-07</td>
</tr>
<tr>
<td width="225">Pvalue_mlog</td>
<td width="564">6.698970004</td>
</tr>
<tr>
<td width="225">p-Value (text)</td>
<td width="564">(S-DCT concentration)</td>
</tr>
<tr>
<td width="225">OR or beta</td>
<td width="564">NR</td>
</tr>
<tr>
<td width="225">95% CI (text)</td>
<td width="564">NR</td>
</tr>
<tr>
<td width="225">Platform [SNPs passing QC]</td>
<td width="564">Illumina [7,537,437] (Imputed)</td>
</tr>
<tr>
<td width="225">CNV</td>
<td width="564">N</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>上面这个文件是由tab键分割的，每一列的意义如下！</p>
<p>Note: The SNP data in the catalog has been mapped to dbSNP Build 142 and Genome Assembly,</p>
<p>GRCh38/hg37.p13.</p>
<p>DATE ADDED TO CATALOG: Date added to catalog</p>
<p>PUBMEDID: PubMed identification number</p>
<p>FIRST AUTHOR: Last name of first author</p>
<p>DATE: Publication date (online (epub) date if available)</p>
<p>JOURNAL: Abbreviated journal name</p>
<p>LINK: PubMed URL</p>
<p>STUDY: Title of paper (linked to PubMed abstract)</p>
<p>DISEASE/TRAIT: Disease or trait examined in study</p>
<p>INITIAL SAMPLE SIZE: Sample size for Stage 1 of GWAS</p>
<p>REPLICATION SAMPLE SIZE: Sample size for subsequent replication(s)</p>
<p>REGION: Cytogenetic region associated with rs number (NCBI)</p>
<p>CHR_ID: Chromosome number associated with rs number (NCBI)</p>
<p>CHR_POS: Chromosomal position associated with rs number (dbSNP Build 132,</p>
<p>NCBI)</p>
<p>REPORTED GENE (S): Gene(s) reported by author</p>
<p>MAPPED GENE(S): Gene(s) mapped to the strongest SNP (NCBI). If the SNP is</p>
<p>located within a gene, that gene is listed. If the SNP is intergenic, the upstream and</p>
<p>downstream genes are listed, separated by a hyphen. UPSTREAM_GENE_ID:</p>
<p>Entrez Gene ID for nearest upstream gene to rs number, if not within gene (NCBI)</p>
<p>DOWNSTREAM_GENE_ID: Entrez Gene ID for nearest downstream gene to rs</p>
<p>number, if not within gene (NCBI)</p>
<p>SNP_GENE_IDS: Entrez Gene ID, if rs number within gene; multiple genes</p>
<p>denotes overlapping transcripts (NCBI)</p>
<p>UPSTREAM_GENE_DISTANCE: distance in kb for nearest upstream gene to rs</p>
<p>number, if not within gene (NCBI)</p>
<p>DOWNSTREAM_GENE_DISTANCE: distance in kb for nearest downstream</p>
<p>gene to rs number, if not within gene (NCBI)</p>
<p>STRONGEST SNP-RISK ALLELE: SNP(s) most strongly associated with trait +</p>
<p>risk allele (? for unknown risk allele). May also refer to a haplotype.</p>
<p>SNPS: Strongest SNP; if a haplotype is reported above, may include more than one</p>
<p>rs number (multiple SNPs comprising the haplotype)</p>
<p>MERGED: denotes whether the SNP has been merged into a subsequent rs record</p>
<p>(0 = no; 1 = yes; NCBI)</p>
<p>SNP_ID_CURRENT: current rs number (will differ from strongest SNP when</p>
<p>merged = 1)</p>
<p>CONTEXT: SNP functional class (NCBI)</p>
<p>INTERGENIC: denotes whether SNP is in intergenic region (0 = no; 1 = yes;</p>
<p>NCBI)</p>
<p>RISK ALLELE FREQUENCY: Reported risk allele frequency associated with</p>
<p>strongest SNP</p>
<p>P-VALUE: Reported p-value for strongest SNP risk allele (linked to dbGaP</p>
<p>Association Browser)</p>
<p>PVALUE_MLOG: -log(p-value)</p>
<p>P-VALUE (TEXT): Information describing context of p-value (e.g. females,</p>
<p>smokers).</p>
<p>Note that p-values are rounded to 1 significant digit (for example, a published pvalue of 4.8 x 10-7 is rounded to 5 x 10-7).</p>
<p>OR or BETA: Reported odds ratio or beta-coefficient associated with strongest</p>
<p>SNP risk allele</p>
<p>95% CI (TEXT): Reported 95% confidence interval associated with strongest SNP</p>
<p>risk allele</p>
<p>PLATFORM (SNPS PASSING QC): Genotyping platform manufacturer used in</p>
<p>Stage 1; also includes notation of pooled DNA study design or imputation of</p>
<p>SNPs, where applicable</p>
<p>CNV: Study of copy number variation (yes/no)</p>
<p>Updated: January 13, 2015</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/719.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
