仔细探究picard的MarkDuplicates 是如何行使去除PCR重复reads功能的

本帖紧跟前面的仔细探究samtools的rmdup是如何行使去除PCR重复reads功能的

同样的我们也是分单端和双端测序来看结果,并且比较两个工具的区别!

首先对于那个单端数据,samtools给出的结果是:[bam_rmdupse_core] 25 / 53 = 0.4717 in library

而我用picard得到的结果是:

INFO 2016-11-12 09:48:29 MarkDuplicates Read 53 records. 0 pairs never matched.
INFO 2016-11-12 09:48:31 MarkDuplicates After buildSortedReadEndLists freeMemory: 248541856; totalMemory: 3887595520; maxMemory: 57266405376
INFO 2016-11-12 09:48:31 MarkDuplicates Will retain up to 1789575168 duplicate indices before spilling to disk.
INFO 2016-11-12 09:49:14 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2016-11-12 09:49:15 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2016-11-12 09:49:15 MarkDuplicates Sorting list of duplicate records.
INFO 2016-11-12 09:54:35 MarkDuplicates After generateDuplicateIndexes freeMemory: 3885082288; totalMemory: 18204327936; maxMemory: 57266405376
INFO 2016-11-12 09:54:35 MarkDuplicates Marking 25 records as duplicates.
INFO 2016-11-12 09:54:35 MarkDuplicates Found 0 optical duplicate clusters.

 

看起来并没有差别哦,找到的duplicate都是一样的,但是这种java软件的缺点就是奇慢无比~~~~

而且picard对于单端或者双端测序数据并没有区分参数,可以用同一个命令!

那么接下来我测试双端测序数据, 依然是没有差别,都是去掉了4个,可能是我给出的测试数据太少了。

INFO 2016-11-12 09:57:45 MarkDuplicates Read 30 records. 3 pairs never matched.
INFO 2016-11-12 09:57:47 MarkDuplicates After buildSortedReadEndLists freeMemory: 248541896; totalMemory: 3887595520; maxMemory: 57266405376
INFO 2016-11-12 09:57:47 MarkDuplicates Will retain up to 1789575168 duplicate indices before spilling to disk.
INFO 2016-11-12 09:58:26 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2016-11-12 09:58:26 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2016-11-12 09:58:26 MarkDuplicates Sorting list of duplicate records.
INFO 2016-11-12 10:02:59 MarkDuplicates After generateDuplicateIndexes freeMemory: 3885083112; totalMemory: 18204327936; maxMemory: 57266405376
INFO 2016-11-12 10:02:59 MarkDuplicates Marking 4 records as duplicates.

 

测试数据,大家可以去下载,里面有脚本和测试数据!http://www.biotrainee.com/jmzeng/rmDuplicate.zip 

 

Comments are closed.