STAR的速度为何如此诡异

同样的fastq文件大小,时间完全不一样!

希望走过路过的小伙伴能检查一下自己的转录组数据,帮我测试一下你们的star,看看能不能发现这样的现象哈,欢迎发邮件跟我交流,我的邮箱是 jmzeng1314 在 163邮箱。

这个数据耗时约3个小时。

1.5G Dec  1 23:45 /MK2305-4LA_1.fq.gz
1.7G Dec  1 23:47 /MK2305-4LA_2.fq.gz
​
MK2305-4LA /MK2305-4LA_1.fq.gz /MK2305-4LA_2.fq.gz
​
May 23 20:22:34 ..... started STAR run
May 23 20:22:35 ..... loading genome
May 23 20:25:15 ..... started 1st pass mapping
May 23 21:10:43 ..... finished 1st pass mapping
May 23 21:10:44 ..... inserting junctions into the genome indices
May 23 21:13:32 ..... started mapping
May 23 23:14:01 ..... finished successfully

下面的耗时约16个小时

 1.9G Dec  2 10:17 /MK2313-3RA_1.fq.gz
 2.1G Dec  2 10:02 /MK2313-3RA_2.fq.gz
​
MK2313-3RA /MK2313-3RA_1.fq.gz /MK2313-3RA_2.fq.gz
​
May 23 23:56:03 ..... started STAR run
May 23 23:56:03 ..... loading genome
May 23 23:58:30 ..... started 1st pass mapping
May 24 05:05:39 ..... finished 1st pass mapping
May 24 05:05:41 ..... inserting junctions into the genome indices
May 24 05:08:35 ..... started mapping
May 24 15:42:39 ..... finished successfully

从文件大小来看,看不出区别:

 2.3G May 23 23:26 MK2305-4LA.bam
 2.4M May 23 23:27 MK2305-4LA.bam.bai
  71M May 23 23:13 MK2305-4LA_Chimeric.out.junction
 587M May 23 23:13 MK2305-4LA_Chimeric.out.sam
  20M May 23 23:49 MK2305-4LA.counts.txt
  332 May 23 23:49 MK2305-4LA.counts.txt.summary
  439 May 23 23:28 MK2305-4LA.flagstat
 1.9K May 23 23:13 MK2305-4LA_Log.final.out
  24K May 23 23:13 MK2305-4LA_Log.out
  15K May 23 23:13 MK2305-4LA_Log.progress.out
 8.1M May 23 23:13 MK2305-4LA_SJ.out.tab
​
 2.8G May 24 15:58 MK2313-3RA.bam
 2.5M May 24 15:59 MK2313-3RA.bam.bai
 108M May 24 15:42 MK2313-3RA_Chimeric.out.junction
 886M May 24 15:42 MK2313-3RA_Chimeric.out.sam
  20M May 24 16:27 MK2313-3RA.counts.txt
  332 May 24 16:27 MK2313-3RA.counts.txt.summary
  439 May 24 16:00 MK2313-3RA.flagstat
 1.9K May 24 15:42 MK2313-3RA_Log.final.out
  24K May 24 15:42 MK2313-3RA_Log.out
  42K May 24 15:42 MK2313-3RA_Log.progress.out
 8.2M May 24 15:42 MK2313-3RA_SJ.out.tab

我的star命令是:

   if [  ! -f $sample.bam  ]; then
                        #$bin_star --runThreadN  5  --genomeLoad  LoadAndKeep  --genomeDir $star_index  --readFilesCommand zcat --readFilesIn $analysis_dir/clean/${fq1_base}_val_1.fq.gz $analysis_dir/clean/${fq2_base}_val_2.fq.gz  --outFileNamePrefix  ${sample}_
$bin_star --runThreadN  5  \
--genomeDir $star_index  \
--twopassMode Basic --outReadsUnmapped None \
--chimSegmentMin 12  \
--alignIntronMax 100000 \
--chimSegmentReadGapMax parameter 3  \
--alignSJstitchMismatchNmax 5 -1 5 5  \
--readFilesCommand zcat \
--readFilesIn $analysis_dir/clean/${fq1_base}_val_1.fq.gz \
$analysis_dir/clean/${fq2_base}_val_2.fq.gz  \
--outFileNamePrefix  ${sample}_
                fi

Comments are closed.