<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; adaptor</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/adaptor/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>使用trimmomatic对illumina数据做质控-去接头还有去除低质量碱基</title>
		<link>http://www.bio-info-trainee.com/1958.html</link>
		<comments>http://www.bio-info-trainee.com/1958.html#comments</comments>
		<pubDate>Sat, 22 Oct 2016 02:50:42 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[adaptor]]></category>
		<category><![CDATA[illumina]]></category>
		<category><![CDATA[Trimmomatic]]></category>
		<category><![CDATA[接头]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1958</guid>
		<description><![CDATA[因为一直拿到的是公司给的特别好的数据，所以没太关注质控这个问题，最近拿到了raw &#8230; <a href="http://www.bio-info-trainee.com/1958.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>因为一直拿到的是公司给的特别好的数据，所以没太关注质控这个问题，最近拿到了raw data，才发现其实里面的门道挺多的。前面都是用cutadapt这个python软件来去除接头的，但是它有一个弊端，需要自己指定接头文件。正好朋友推荐了trimmomatic，是java软件，所以直接Google找到其官网，然后下载二进制版本解压即可使用！</div>
<div><strong><span style="color: #ff0000;">反正对我的illumina测序数据来说，直接用它就可以把raw data 变成 clean data啦！</span></strong></div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/433ed6a29afb46e3aa9e2cc84cbaf0a4/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="66F96EEA44B64A4CAB6D6D8255B82CD2" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/11.png"><img class="alignnone size-full wp-image-1959" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/11.png" alt="1" width="599" height="200" /></a></div>
<p><span id="more-1958"></span></p>
<div>这个软件设计就是为了illumina的测序数据的，因为它自带的adaptor文件有限，上图可以看到！而且一般只去除TruSeq Universal Adapter 这个接头，运行的时候，不报错才算是成功的！</div>
<div>官网有例子，很简单的：<a href="http://www.usadellab.org/cms/?page=trimmomatic">http://www.usadellab.org/cms/?page=trimmomatic</a></div>
<div>Paired End:</div>
<div>java -jar trimmomatic-0.35.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ## 所以只需要把参数放对位置即可！</div>
<div>This will perform the following:</div>
<ul>
<li>Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)</li>
<li>Remove leading low quality or N bases (below quality 3) (LEADING:3)</li>
<li>Remove trailing low quality or N bases (below quality 3) (TRAILING:3)</li>
<li>Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)</li>
<li>Drop reads below the 36 bases long (MINLEN:36)</li>
</ul>
<div>一般就使用这个默认参数就好啦，处理的时间会有一点慢，我取了10个线程也得十几分钟才搞定2G的fq.gz压缩格式的测序文件，文件的log日志如下：</div>
<div>TrimmomaticPE: Started with arguments:</div>
<div>-threads 10 -phred33 -trimlog tmp.log CHG006373_R1.fastq.gz CHG006373_R2.fastq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:/home/jmzeng//biosoft/trimmomatic/Trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 LEADING:10 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:36</div>
<div>Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'</div>
<div>ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences</div>
<div>Input Read Pairs: 21427010 Both Surviving: 14507723 (67.71%) Forward Only Surviving: 5297811 (24.72%) Reverse Only Surviving: 375547 (1.75%) Dropped: 1245929 (5.81%)</div>
<div>TrimmomaticPE: Completed successfully</div>
<div>记住指定接头文件一定要用全路径哦！！！</div>
<div>可以看到它使用了自带的文件TruSeq3-PE.fa里面的接头 TACACTCTTTCCCTACACGACGCTCTTCCGATCT其实只是 TruSeq Universal Adapter (可以在<a href="https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt">https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt</a> 找到接头信息)的后半段，直接在R1测序文件里面搜索可以看到，距离AAAAAAAAAAAAATTTTTTTTTTTTTTTTT这样的字符串和它的 接头 TACACTCTTTCCCTACACGACGCTCTTCCGATCT之间还有序列：</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/0bb1ff2bbe944182ab5b1f12405b0f4a/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="9A514F9A3AB245A4BFE5FD740EB6EAA6" /><img class="alignnone size-full wp-image-1960" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/21.png" alt="2" width="1082" height="460" /></div>
<div></div>
<div>比如我们拿第一个序列举例，可以看到第一条序列被trimmomatic丢到了output_forward_unpaired.fq.gz，它就懒得给它去除接头了，因为右端序列更可怜！</div>
<div>检查文件，发现有的地方是根据质量值来去除的，因为跟接头没有半毛钱关系！</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/b2acfc76979543a0b33a7975e6828920/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="7FAB978A93294E5B81C2AB4D22DE05C6" /><img class="alignnone size-full wp-image-1961" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/3.png" alt="3" width="1106" height="358" /></div>
<div></div>
<div>因为它是接头和低质量碱基一起去除，我很难探究它到底是如何去除接头的，非常郁闷，但是它对illumina的数据效果非常好！因为去除的百分比很高。</div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1958.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
