<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; MarkDuplicates</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/markduplicates/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>仔细探究picard的MarkDuplicates 是如何行使去除PCR重复reads功能的</title>
		<link>http://www.bio-info-trainee.com/2008.html</link>
		<comments>http://www.bio-info-trainee.com/2008.html#comments</comments>
		<pubDate>Sat, 12 Nov 2016 02:11:23 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[生信基础]]></category>
		<category><![CDATA[MarkDuplicates]]></category>
		<category><![CDATA[pcr]]></category>
		<category><![CDATA[picard]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2008</guid>
		<description><![CDATA[本帖紧跟前面的仔细探究samtools的rmdup是如何行使去除PCR重复rea &#8230; <a href="http://www.bio-info-trainee.com/2008.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>本帖紧跟前面的<a title="详细阅读 仔细探究samtools的rmdup是如何行使去除PCR重复reads功能的" href="http://www.bio-info-trainee.com/2003.html" rel="bookmark">仔细探究samtools的rmdup是如何行使去除PCR重复reads功能的</a></p>
<p>同样的我们也是分单端和双端测序来看结果，并且比较两个工具的区别！</p>
<p>首先对于那个单端数据，samtools给出的结果是：[bam_rmdupse_core] 25 / 53 = 0.4717 in library<span id="more-2008"></span></p>
<p>而我用picard得到的结果是：</p>
<blockquote><p>INFO 2016-11-12 09:48:29 MarkDuplicates <strong><span style="color: #ff00ff;">Read 53 records. 0 pairs never matched.</span></strong><br />
INFO 2016-11-12 09:48:31 MarkDuplicates After buildSortedReadEndLists freeMemory: 248541856; totalMemory: 3887595520; maxMemory: 57266405376<br />
INFO 2016-11-12 09:48:31 MarkDuplicates Will retain up to 1789575168 duplicate indices before spilling to disk.<br />
INFO 2016-11-12 09:49:14 MarkDuplicates Traversing read pair information and detecting duplicates.<br />
INFO 2016-11-12 09:49:15 MarkDuplicates Traversing fragment information and detecting duplicates.<br />
INFO 2016-11-12 09:49:15 MarkDuplicates Sorting list of duplicate records.<br />
INFO 2016-11-12 09:54:35 MarkDuplicates After generateDuplicateIndexes freeMemory: 3885082288; totalMemory: 18204327936; maxMemory: 57266405376<br />
INFO 2016-11-12 09:54:35 MarkDuplicates <span style="color: #ff00ff;"><strong>Marking 25 records as duplicates.</strong></span><br />
INFO 2016-11-12 09:54:35 MarkDuplicates Found 0 optical duplicate clusters.</p>
<p>&nbsp;</p></blockquote>
<p>看起来并没有差别哦，找到的duplicate都是一样的，但是这种java软件的缺点就是奇慢无比~~~~</p>
<p>而且picard对于单端或者双端测序数据并没有区分参数，可以用同一个命令！</p>
<p>那么接下来我测试双端测序数据, 依然是没有差别，都是去掉了4个，可能是我给出的测试数据太少了。</p>
<blockquote><p>INFO 2016-11-12 09:57:45 MarkDuplicates<strong><span style="color: #ff00ff;"> Read 30 records. 3 pairs never matched.</span></strong><br />
INFO 2016-11-12 09:57:47 MarkDuplicates After buildSortedReadEndLists freeMemory: 248541896; totalMemory: 3887595520; maxMemory: 57266405376<br />
INFO 2016-11-12 09:57:47 MarkDuplicates Will retain up to 1789575168 duplicate indices before spilling to disk.<br />
INFO 2016-11-12 09:58:26 MarkDuplicates Traversing read pair information and detecting duplicates.<br />
INFO 2016-11-12 09:58:26 MarkDuplicates Traversing fragment information and detecting duplicates.<br />
INFO 2016-11-12 09:58:26 MarkDuplicates Sorting list of duplicate records.<br />
INFO 2016-11-12 10:02:59 MarkDuplicates After generateDuplicateIndexes freeMemory: 3885083112; totalMemory: 18204327936; maxMemory: 57266405376<br />
INFO 2016-11-12 10:02:59 MarkDuplicates <strong><span style="color: #ff00ff;">Marking 4 records as duplicates.</span></strong></p>
<p>&nbsp;</p></blockquote>
<p>测试数据，大家可以去下载，里面有脚本和测试数据！<a href="http://www.biotrainee.com/jmzeng/rmDuplicate.zip " target="_blank">http://www.biotrainee.com/jmzeng/rmDuplicate.zip </a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2008.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
