肿瘤全外显子测序数据分析流程大放送

这个一个肿瘤外显子项目的文章发表并且公布的公共数据,我这里给出全套分析流程代码。只需要你肯实践,就可以运行成功。

PS:有些后起之秀自己运营公众号或者博客喜欢批评我们这些老人,一味的堆砌代码不给解释,恶意揣测我们是因为不懂代码的原理。我表示很无语,我写了3千多篇教程,如果一篇篇都重复提到基础知识,我真的做不到。比如下面的流程,包括软件的用法,软件安装,注释数据库的下载,我博客都说过好几次了,直播我的基因组系列也详细解读过,我告诉你去哪里学,你却不珍惜,不当回事,呵呵。

文章解读

paper;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4812767/a whole-exome sequencing (WES) study was performed in 161 NPC cases and 895 controls of Southern Chinese descent

测序概述:

We sequenced the blood samples from 161 NPC cases, including 39 EAO cases, 63 FH+ cases from 52 independent families, and 59 sporadic cases by WES and achieved an average coverage of 49-fold on target (range of 32- to 76-fold)

后来还扩大了验证样本集:

An additional 2,160 NPC cases and 2,433 healthy controls from Hong Kong were further examined for the selected candidate variants.

数据全部上传了:

Whole-exome sequencing data for the early-age onset cases have been deposited in the Sequence Read Archive (SRA) database (accession ID SRA291701).

了解WES数据分析步骤:2016-A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data

选择部分数据如下:

随便选择了5个样本,可以在SRA数据库下载拿到这些信息(这个小技巧自行探索,或者看我B站教学视频)其ID号及描述如下,

SRX445405   MALE    NPC15   SRR1139956  NPC15F  NO  SRS540548   NPC15F-T
SRX445406   MALE    NPC15   SRR1139958  NPC15F  NO  SRS540549   NPC15F-N
SRX445407   MALE    NPC29   SRR1139966  NPC29F  YES SRS540550   NPC29F-T
SRX445408   MALE    NPC29   SRR1139973  NPC29F  YES SRS540551   NPC29F-N
SRX445409   FEMALE  NPC10   SRR1139999  NPC10F  NO  SRS540552   NPC10F-T
SRX445410   FEMALE  NPC10   SRR1140007  NPC10F  NO  SRS540553   NPC10F-N
SRX445411   FEMALE  NPC34   SRR1140015  NPC34F  NO  SRS540554   NPC34F-T
SRX445412   FEMALE  NPC34   SRR1140023  NPC34F  NO  SRS540555   NPC34F-N
SRX445413   MALE    NPC37   SRR1140044  NPC37F  YES SRS540556   NPC37F-T
SRX445414   MALE    NPC37   SRR1140045  NPC37F  YES SRS540557   NPC37F-N

从SRA数据库下载并转换为fastq测序数据文件

把上面的描述文本存为文件npc.sra.txt下载,脚本如下:

cat npc.sra.txt | cut -f 4|while read id
do echo $id
wget -c ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP035/SRP035573/$id/$id.sra
done

上面的代码是3年前的,目前大家都用sratoolkit的prefectch命令来根据id号直接下载,并不需要自行拼凑下载链接了,现在学生信真的是越来越幸福了,一切从简,记得看我B站教学视频:https://space.bilibili.com/338686099/#/

转换脚本如下:

cat npc.sra.txt | while read id
do
array=($id)
echo  ${array[3]}.sra  ${array[7]} 
~/biosoft/sratoolkit/sratoolkit.2.8.2-1-centos_linux64/bin/fastq-dump --gzip --split-3 -A \  
${array[7]}  ${array[3]}.sra 
done

得到的fastq文件如下:

3.5G Aug 26 09:48 NPC10F-N_1.fastq.gz
3.6G Aug 26 09:48 NPC10F-N_2.fastq.gz
3.2G Aug 26 09:48 NPC10F-T_1.fastq.gz
3.3G Aug 26 09:48 NPC10F-T_2.fastq.gz
2.7G Aug 26 09:47 NPC15F-N_1.fastq.gz
2.8G Aug 26 09:47 NPC15F-N_2.fastq.gz
2.8G Aug 26 09:47 NPC15F-T_1.fastq.gz
2.9G Aug 26 09:47 NPC15F-T_2.fastq.gz
2.8G Aug 26 09:47 NPC29F-N_1.fastq.gz
2.9G Aug 26 09:47 NPC29F-N_2.fastq.gz
2.4G Aug 26 09:47 NPC29F-T_1.fastq.gz
2.5G Aug 26 09:47 NPC29F-T_2.fastq.gz
2.0G Aug 26 09:47 NPC34F-N_1.fastq.gz
2.0G Aug 26 09:47 NPC34F-N_2.fastq.gz
2.2G Aug 26 09:48 NPC34F-T_1.fastq.gz
2.3G Aug 26 09:48 NPC34F-T_2.fastq.gz
2.1G Aug 26 09:47 NPC37F-N_1.fastq.gz
2.1G Aug 26 09:47 NPC37F-N_2.fastq.gz
2.2G Aug 26 09:46 NPC37F-T_1.fastq.gz
2.2G Aug 26 09:46 NPC37F-T_2.fastq.gz

简单的走一下fastqc+multiqc 看看数据质量,一般都会很不错的,这个数据也不例外。

ls *.gz |xargs ~/biosoft/fastqc/FastQC/fastqc -o ./ -t 5

但是值得注意的是本文章中的测序reads的编码格式是phred+64 而不是传统的33

然后走WES的标准SNP-calling流程

选用的是经典的GATK best practice的流程,代码如下:

​
module load java/1.8.0_91
GENOME=/home/jianmingzeng/reference/genome/human_g1k_v37/human_g1k_v37.fasta
INDEX=/home/jianmingzeng/reference/index/bwa/human_g1k_v37
GATK=/home/jianmingzeng/biosoft/GATK/GenomeAnalysisTK.jar
PICARD=/home/jianmingzeng/biosoft/picardtools/2.9.2/picard.jar
​
DBSNP=/home/jianmingzeng/annotation/variation/human/dbSNP/All_20160601.vcf.gz
SNP=/home/jianmingzeng/biosoft/GATK/resources/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
INDEL=/home/jianmingzeng/biosoft/GATK/resources/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
​
KG_SNP=/home/jianmingzeng/biosoft/GATK/resources/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
Mills_indels=/home/jianmingzeng/biosoft/GATK/resources/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf
KG_indels=/home/jianmingzeng/biosoft/GATK/resources/bundle/b37/1000G_phase1.indels.b37.vcf
​
TMPDIR=/home/jianmingzeng/tmp/software
## samtools and bwa are in the environment
## samtools Version: 1.3.1 (using htslib 1.3.1)
## bwa Version: 0.7.15-r1140
​
​
#arr=($1)
#fq1=${arr[0]}
#fq2=${arr[1]}
#sample=${arr[2]}fq1=$1
fq2=$2
sample=$3
​
​
#####################################################
################ Step 1 : Alignment #################
#####################################################
start=$(date +%s.%N)
echo bwa `date`
bwa mem -t 5 -M  -R "@RG\tID:$sample\tSM:$sample\tLB:WES\tPL:Illumina" $INDEX $fq1 $fq2 1>$sample.sam 2>/dev/null 
echo bwa `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for BWA : %.6f seconds" $dur
echo#####################################################
################ Step 2: Sort and Index #############
#####################################################
start=$(date +%s.%N)
echo SortSam `date`
java -Djava.io.tmpdir=$TMPDIR    -Xmx40g -jar $PICARD SortSam SORT_ORDER=coordinate INPUT=$sample.sam OUTPUT=$sample.bam
samtools index $sample.bam
echo SortSam `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for SortSam : %.6f seconds" $dur
echo 
rm $sample.sam 
​
#####################################################
################ Step 3: Basic Statistics ###########
#####################################################
start=$(date +%s.%N)
echo stats `date`
samtools flagstat $sample.bam > ${sample}.alignment.flagstat
samtools stats  $sample.bam > ${sample}.alignment.stat
echo plot-bamstats -p ${sample}_QC  ${sample}.alignment.stat
echo stats `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for Basic Statistics : %.6f seconds" $dur
echo#####################################################
####### Step 4: multiple filtering for bam files ####
#####################################################
​
​
###MarkDuplicates###
start=$(date +%s.%N)
echo MarkDuplicates `date`
java -Djava.io.tmpdir=$TMPDIR    -Xmx40g -jar $PICARD MarkDuplicates \
INPUT=$sample.bam OUTPUT=${sample}_marked.bam METRICS_FILE=$sample.metrics
echo MarkDuplicates `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for MarkDuplicates : %.6f seconds" $dur
echo 
rm $sample.bam  
​
###FixMateInfo###
start=$(date +%s.%N)
echo FixMateInfo `date`
java -Djava.io.tmpdir=$TMPDIR    -Xmx40g -jar $PICARD FixMateInformation \
INPUT=${sample}_marked.bam OUTPUT=${sample}_marked_fixed.bam SO=coordinate
samtools index ${sample}_marked_fixed.bam
echo FixMateInfo `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for FixMateInfo  : %.6f seconds" $dur
echo 
​
​
rm ${sample}_marked.bam 
​
#####################################################
####### Step 5: gatk process bam files ##############
#####################################################
​
​
​
### SplitNCigar ###
start=$(date +%s.%N)
echo SplitNCigar `date`
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T SplitNCigarReads \
 -R $GENOME  -I ${sample}_marked_fixed.bam  -o ${sample}_marked_fixed_split.bam \
-rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS 
#--fix_misencoded_quality_scores
## --fix_misencoded_quality_scores only if phred 64 
echo SplitNCigar `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for SplitNCigar : %.6f seconds" $dur
echorm ${sample}_marked_fixed.bam 
# rm ${sample}.sam ${sample}.bam ${sample}_marked.bam ${sample}_marked_fixed.bam
​
​
###RealignerTargetCreator###
start=$(date +%s.%N)
echo RealignerTargetCreator `date`
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T RealignerTargetCreator \
-I ${sample}_marked_fixed_split.bam -R $GENOME -o ${sample}_target.intervals \
-known $Mills_indels -known $KG_indels -nt 5
echo RealignerTargetCreator `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for RealignerTargetCreator : %.6f seconds" $dur
echo 
​
​
###IndelRealigner###
start=$(date +%s.%N)
echo IndelRealigner `date`
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T IndelRealigner \
-I ${sample}_marked_fixed_split.bam  -R $GENOME -targetIntervals ${sample}_target.intervals \
-o ${sample}_realigned.bam -known $Mills_indels -known $KG_indels
echo IndelRealigner `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for IndelRealigner : %.6f seconds" $dur
echorm ${sample}_marked_fixed_split.bam
​
###BaseRecalibrator###
start=$(date +%s.%N)
echo BaseRecalibrator `date`
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T BaseRecalibrator \
-I ${sample}_realigned.bam -R $GENOME -o ${sample}_temp.table -knownSites $DBSNP
echo BaseRecalibrator `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for BaseRecalibrator : %.6f seconds" $dur
echo 
​
​
###PrintReads###
start=$(date +%s.%N)
echo PrintReads `date`
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T PrintReads \
-R $GENOME -I ${sample}_realigned.bam -o ${sample}_recal.bam -BQSR ${sample}_temp.table
samtools index ${sample}_recal.bam
echo PrintReads `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for PrintReads : %.6f seconds" $dur
echorm  ${sample}_realigned.bam
chmod uga=r   ${sample}_recal.bam 
 
#####################################################
############## Step 6: gatk call snp/indel##########
######################################################## 
start=$(date +%s.%N)
echo HaplotypeCaller `date`
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T HaplotypeCaller  \
-R $GENOME -I ${sample}_recal.bam --dbsnp $DBSNP  \
-stand_emit_conf 10 -o  ${sample}_raw.snps.indels.vcf
echo HaplotypeCaller `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for HaplotypeCaller : %.6f seconds" $dur
echo 
​
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T SelectVariants  -R $GENOME  \
-selectType SNP \
-V ${sample}_raw.snps.indels.vcf -o ${sample}_raw_snps.vcf
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T SelectVariants  -R $GENOME \
-selectType INDEL  \
-V ${sample}_raw.snps.indels.vcf   -o ${sample}_raw_indels.vcf
​
​
## 
:'
'
## for SNP
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T VariantFiltration -R $GENOME  \
--filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"  \
--filterName "my_snp_filter" \
-V ${sample}_raw_snps.vcf  -o ${sample}_filtered_snps.vcf   
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T SelectVariants -R $GENOME  \
--excludeFiltered \
-V ${sample}_filtered_snps.vcf  -o  ${sample}_filtered_PASS.snps.vcf
​
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T VariantEval -R $GENOME  \
-eval ${sample}_filtered_PASS.snps.vcf -o  ${sample}_filtered_PASS.snps.vcf.eval
​
​
## for  INDEL
​
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T VariantFiltration -R $GENOME  \
--filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0"  \
--filterName "my_indel_filter" \
-V ${sample}_raw_indels.vcf  -o ${sample}_filtered_indels.vcf   
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T SelectVariants -R $GENOME  \
--excludeFiltered \
-V ${sample}_filtered_indels.vcf  -o  ${sample}_filtered_PASS.indels.vcf
​
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK -T VariantEval -R $GENOME  \
-eval ${sample}_filtered_PASS.indels.vcf -o  ${sample}_filtered_PASS.indels.vcf.eval
​
​
:'
'
## bam文件的QC
bedtools coverage  -hist   -abam ${sample}.bam  \
-b ~/annotation/CCDS/human/hg19_exon.bed  >${sample}.exome.coverage.hist.txt
## 后面的几个质控,代码是我自己写的,所以大家可以不运行。
perl ~/scripts/calc_coverage_depth.pl ~/annotation/CCDS/human/CCDS.20110907.txt ${sample}.bam
​
​
​

比对成功后得到的bam文件如下;

7.7G Aug 26 19:22 NPC10F-N_marked_fixed.bam
7.7G Aug 26 22:59 NPC10F-N_marked_fixed_split.bam
7.7G Aug 26 23:57 NPC10F-N_realigned.bam
 15G Aug 27 03:45 NPC10F-N_recal.bam 
 
7.0G Aug 26 19:49 NPC10F-T_marked_fixed.bam
7.0G Aug 26 22:55 NPC10F-T_marked_fixed_split.bam
7.0G Aug 26 23:48 NPC10F-T_realigned.bam
 13G Aug 27 03:12 NPC10F-T_recal.bam

Snp-calling结束后得到的vcf如下:

    82576 NPC10F-N_raw_indels.vcf
   863243 NPC10F-N_raw_snps.vcf
   945753 NPC10F-N_recal_raw.snps.indels.vcf
    89604 NPC10F-T_raw_indels.vcf
   819657 NPC10F-T_raw_snps.vcf
   909190 NPC10F-T_recal_raw.snps.indels.vcf

这些vcf里面的变异位点还需要进行简单的过滤,或者只提取外显子区域的变异位点。

消耗时间如下;

# for NPC10F-N_1.fastq.gz NPC10F-N_2.fastq.gz NPC10F-N
bwa Sat Aug 26 16:04:44 CST 2017
bwa Sat Aug 26 17:07:11 CST 2017
SortSam Sat Aug 26 17:07:11 CST 2017
SortSam Sat Aug 26 17:45:04 CST 2017
stats Sat Aug 26 17:45:04 CST 2017
plot-bamstats -p NPC10F-N_QC NPC10F-N.alignment.stat
stats Sat Aug 26 17:56:07 CST 2017
MarkDuplicates Sat Aug 26 17:56:07 CST 2017
MarkDuplicates Sat Aug 26 18:40:25 CST 2017
FixMateInfo Sat Aug 26 18:40:25 CST 2017
FixMateInfo Sat Aug 26 19:26:39 CST 2017
SplitNCigar Sat Aug 26 22:20:48 CST 2017
SplitNCigar Sat Aug 26 22:59:32 CST 2017
RealignerTargetCreator Sat Aug 26 22:59:32 CST 2017
RealignerTargetCreator Sat Aug 26 23:17:35 CST 2017
IndelRealigner Sat Aug 26 23:17:35 CST 2017
IndelRealigner Sat Aug 26 23:57:35 CST 2017
BaseRecalibrator Sat Aug 26 23:57:35 CST 2017
BaseRecalibrator Sun Aug 27 01:27:39 CST 2017
PrintReads Sun Aug 27 01:27:39 CST 2017
PrintReads Sun Aug 27 03:52:03 CST 2017
​
​
#for NPC10F-T_1.fastq.gz NPC10F-T_2.fastq.gz NPC10F-T
bwa Sat Aug 26 16:54:14 CST 2017
bwa Sat Aug 26 17:48:07 CST 2017
SortSam Sat Aug 26 17:48:07 CST 2017
SortSam Sat Aug 26 18:20:48 CST 2017
stats Sat Aug 26 18:20:48 CST 2017
plot-bamstats -p NPC10F-T_QC NPC10F-T.alignment.stat
stats Sat Aug 26 18:30:45 CST 2017
MarkDuplicates Sat Aug 26 18:30:45 CST 2017
MarkDuplicates Sat Aug 26 19:11:40 CST 2017
FixMateInfo Sat Aug 26 19:11:40 CST 2017
FixMateInfo Sat Aug 26 19:52:37 CST 2017
SplitNCigar Sat Aug 26 22:20:54 CST 2017
SplitNCigar Sat Aug 26 22:55:53 CST 2017
RealignerTargetCreator Sat Aug 26 22:55:53 CST 2017
RealignerTargetCreator Sat Aug 26 23:14:19 CST 2017
IndelRealigner Sat Aug 26 23:14:19 CST 2017
IndelRealigner Sat Aug 26 23:48:43 CST 2017
BaseRecalibrator Sat Aug 26 23:48:43 CST 2017
BaseRecalibrator Sun Aug 27 01:10:26 CST 2017
PrintReads Sun Aug 27 01:10:26 CST 2017
PrintReads Sun Aug 27 03:18:33 CST 2017

外显子的QC结果是:

## for NPC10F-N
32541594    2392028455  0.982188933530586   73.5068004044301
18711840    934414537   0.971564415370185   49.9370739061471
17075251    505674931   0.895198983425405   29.6144947444696
15656543    241509710   0.819070615186704   15.4254812189383
​
## for NPC10F-T
32348763    2946257280  0.97636879840624    91.0778962398037
18182204    1054428864  0.94406442121146    57.9923569221861
16075260    555403212   0.842772759844003   34.5501853158207
14181528    265599059   0.741905340358179   18.7285219900141

可以看到T的测序深度高于N,符合实验设计。T的外显子区域平均测序深度高达91X,是比较好的数据了,而且外显子旁边50bp范围区域平均测序深度还有57X,旁边100bp区域还有34X,也说明这款芯片的捕获效率比较好

然后走走somatic mutation calling流程

因为是配对数据,还可以走somatic mutation calling流程

reference=/home/jianmingzeng/reference/genome/human_g1k_v37/human_g1k_v37.fasta
GENOME=/home/jianmingzeng/reference/genome/human_g1k_v37/human_g1k_v37.fasta
TMPDIR=/home/jianmingzeng/tmp/software
normal_bam=NPC10F-N_recal.bam
tumor_bam=NPC10F-T_recal.bam
sample=NPC10F
​
#####################################################
################### Step : Run VarScan #############
#####################################################
normal_pileup="samtools mpileup -q 1 -f $reference $normal_bam";
tumor_pileup="samtools mpileup -q 1 -f $reference $tumor_bam";
# Next, issue a system call that pipes input from these commands into VarScan :
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g  -jar ~/biosoft/VarScan/VarScan.v2.3.9.jar \
somatic <($normal_pileup) <($tumor_pileup) $sample
java -jar ~/biosoft/VarScan/VarScan.v2.3.9.jar processSomatic $sample.snp
​
#####################################################
################### Step : Run Mutect2 #############
##################################################### 
java -Djava.io.tmpdir=$TMPDIR   -Xmx40g -jar $GATK  -T MuTect2 \
-R $GENOME -I:tumor $tumor_bam  -I:normal $normal_bam \
--dbsnp  $DBSNP   -o ${sample}-mutect2.vcf
​
#####################################################
################### Step : Run Muse#################
#####################################################
~/biosoft/muse/muse call -O $sample -f $reference $tumor_bam $normal_bam
~/biosoft/muse/muse sump -I ${sample}.MuSE.txt -E –O ${sample}.vcf –D $DBSNP

其中varscan耗费两个半小时,结果如下:

893539210 positions in tumor
891822458 positions shared in normal
79827518 had sufficient coverage for comparison
79718238 were called Reference
26 were mixed SNP-indel calls and filtered
102492 were called Germline
3703 were called LOH
2569 were called Somatic
490 were called Unknown
0 were called Variant
​
Reading input from NPC10F.snp
Opening output files: NPC10F.snp.Somatic NPC10F.snp.Germline NPC10F.snp.LOH
101352 VarScan calls processed
2342 were Somatic (556 high confidence)
95144 were Germline (4830 high confidence)
3502 were LOH (3484 high confidence)

这3个软件找到的somatic mutation可以互相对比一下,差异主要是在哪里,都值得自己去探究,这样最终确定的肿瘤外显子测序数据分析流程就更有把握。

需要安装的软件

软件太多了,我就不一一列出具体代码了,还有很多需要下载的参考基因组,变异数据库也是以前直播基因组的时候已经反复提到过,也不赘述啦。

conda install -c bioconda bedtools
conda install -c bioconda bwa
conda install -c bioconda samtools
​
cd ~/biosoft
mkdir sratoolkit &&  cd sratoolkit
wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz
tar zxvf sratoolkit.current-centos_linux64.tar.gz
~/biosoft/sratoolkit/sratoolkit.2.8.2-1-centos_linux64/bin/fastdump -h
## https://sourceforge.net/projects/picard/
## https://github.com/broadinstitute/picard
cd ~/biosoft
mkdir picardtools &&  cd picardtools
wget http://ncu.dl.sourceforge.net/project/picard/picard-tools/1.119/picard-tools-1.119.zip
unzip picard-tools-1.119.zip
mkdir 2.9.2 && cd 2.9.2 
wget https://github.com/broadinstitute/picard/releases/download/2.9.2/picard.jar
​
​
cd ~/biosoft
## https://sourceforge.net/projects/varscan/files/
## http://varscan.sourceforge.net/index.html
mkdir VarScan  &&  cd VarScan  
wget https://sourceforge.net/projects/varscan/files/VarScan.v2.3.9.jar 
​
cd ~/biosoft
mkdir SnpEff &&  cd SnpEff
##  http://snpeff.sourceforge.net/
##  http://snpeff.sourceforge.net/SnpSift.html 
##  http://snpeff.sourceforge.net/SnpEff_manual.html
wget http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip 
## java -jar snpEff.jar download GRCh37.75
## java -Xmx4G -jar snpEff.jar -i vcf -o vcf GRCh37.75 example.vcf > example_snpeff.vcf
unzip snpEff_latest_core.zip

Comments are closed.