<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 水稻</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e6%b0%b4%e7%a8%bb/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>3000多份水稻全基因组测序数据共享-主要是突变数据</title>
		<link>http://www.bio-info-trainee.com/1053.html</link>
		<comments>http://www.bio-info-trainee.com/1053.html#comments</comments>
		<pubDate>Fri, 16 Oct 2015 11:35:01 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[snp]]></category>
		<category><![CDATA[水稻]]></category>
		<category><![CDATA[突变]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1053</guid>
		<description><![CDATA[感觉最近接触的生物信息学知识越多，越对大数据时代的到来更有同感了。现在的研究者， &#8230; <a href="http://www.bio-info-trainee.com/1053.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>感觉最近接触的生物信息学知识越多，越对大数据时代的到来更有同感了。现在的研究者，其实很多都可以自己在家里做了，大量的数据基本都是公开的， 但是一个人闭门造车成就真的有限，与他人交流的思想碰撞还是蛮重要的。</p>
<div><a href="https://aws.amazon.com/cn/blogs/aws/new-aws-public-data-set-3000-rice-genome/">https://aws.amazon.com/cn/blogs/aws/new-aws-public-data-set-3000-rice-genome/</a></div>
<div><a href="https://aws.amazon.com/cn/public-data-sets/3000-rice-genome/">https://aws.amazon.com/cn/public-data-sets/3000-rice-genome/</a></div>
<div><a href="https://wiki.dnanexus.com/Featured-Projects/3000-rice-genomes">https://wiki.dnanexus.com/Featured-Projects/3000-rice-genomes</a></div>
<div>这里面列出了3000多份水稻全基因组测序数据，都共享在亚马逊云上面，是全基因组的双端测序数据，共3,024个水稻数据，比对到了五种不同的水稻参考基因组上面，而且主要是用GATK来找差异基因的。</div>
<div>而且，数据收集者还给出了一个snp calling的标准流程</div>
<div>
<pre>我以前也是用这样的流程
SNP Pipeline Commands

1. Index the reference genome using bwa index

   /software/bwa-0.7.10/bwa index /reference/japonica/reference.fa

2. Align the paired reads to reference genome using bwa mem. 
   Note: Specify the number of threads or processes to use using the -t parameter. The possible number of threads depends on the machine where the command will run.

   /software/bwa-0.7.10/bwa mem -M -t 8 /reference/japonica/reference.fa /reads/filename_1.fq.gz /reads/filename_2.fq.gz &gt; /output/filename.sam

3. Sort SAM file and output as BAM file

   java -Xmx8g -jar /software/picard-tools-1.119/SortSam.jar INPUT=/output/filename.sam OUTPUT=/output/filename.sorted.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE

4. Fix mate information

   java -Xmx8g -jar /software/picard-tools-1.119/FixMateInformation.jar INPUT=/output/filename.sorted.bam OUTPUT=/output/filename.fxmt.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE

5. Mark duplicate reads

   java -Xmx8g -jar /software/picard-tools-1.119/MarkDuplicates.jar INPUT=/output/filename.fxmt.bam OUTPUT=/output/filename.mkdup.bam METRICS_FILE=/output/filename.metrics VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000

6. Add or replace read groups

   java -Xmx8g -jar /software/picard-tools-1.119/AddOrReplaceReadGroups.jar INPUT=/output/filename.mkdup.bam OUTPUT=/output/filename.addrep.bam RGID=readname PL=Illumina SM=readname CN=BGI VALIDATION_STRINGENCY=LENIENT SO=coordinate CREATE_INDEX=TRUE

7. Create index and dictionary for reference genome

   /software/samtools-1.0/samtools faidx /reference/japonica/reference.fa
   
   java -Xmx8g -jar /software/picard-tools-1.119/CreateSequenceDictionary.jar REFERENCE=/reference/japonica/reference.fa OUTPUT=/reference/reference.dict

8. Realign Target 

   java -Xmx8g -jar /software/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T RealignerTargetCreator -I /output/filename.addrep.bam -R /reference/japonica/reference.fa -o /output/filename.intervals -fixMisencodedQuals -nt 8

9. Indel Realigner

   java -Xmx8g -jar /software/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T IndelRealigner -fixMisencodedQuals -I /output/filename.addrep.bam -R /reference/japonica/reference.fa -targetIntervals /output/filename.intervals -o /output/filename.realn.bam 

10. Merge individual BAM files if there are multiple read pairs per sample

   /software/samtools-1.0/samtools merge /output/filename.merged.bam /output/*.realn.bam

11. Call SNPs using Unified Genotyper

   java -Xmx8g -jar /software/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /reference/japonica/reference.fa -I /output/filename.merged.bam -o filename.merged.vcf -glm BOTH -mbq 20 --genotyping_mode DISCOVERY -out_mode EMIT_ALL_SITES</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1053.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
