<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 接头</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e6%8e%a5%e5%a4%b4/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>使用trimmomatic对illumina数据做质控-去接头还有去除低质量碱基</title>
		<link>http://www.bio-info-trainee.com/1958.html</link>
		<comments>http://www.bio-info-trainee.com/1958.html#comments</comments>
		<pubDate>Sat, 22 Oct 2016 02:50:42 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[adaptor]]></category>
		<category><![CDATA[illumina]]></category>
		<category><![CDATA[Trimmomatic]]></category>
		<category><![CDATA[接头]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1958</guid>
		<description><![CDATA[因为一直拿到的是公司给的特别好的数据，所以没太关注质控这个问题，最近拿到了raw &#8230; <a href="http://www.bio-info-trainee.com/1958.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>因为一直拿到的是公司给的特别好的数据，所以没太关注质控这个问题，最近拿到了raw data，才发现其实里面的门道挺多的。前面都是用cutadapt这个python软件来去除接头的，但是它有一个弊端，需要自己指定接头文件。正好朋友推荐了trimmomatic，是java软件，所以直接Google找到其官网，然后下载二进制版本解压即可使用！</div>
<div><strong><span style="color: #ff0000;">反正对我的illumina测序数据来说，直接用它就可以把raw data 变成 clean data啦！</span></strong></div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/433ed6a29afb46e3aa9e2cc84cbaf0a4/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="66F96EEA44B64A4CAB6D6D8255B82CD2" /><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/11.png"><img class="alignnone size-full wp-image-1959" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/11.png" alt="1" width="599" height="200" /></a></div>
<p><span id="more-1958"></span></p>
<div>这个软件设计就是为了illumina的测序数据的，因为它自带的adaptor文件有限，上图可以看到！而且一般只去除TruSeq Universal Adapter 这个接头，运行的时候，不报错才算是成功的！</div>
<div>官网有例子，很简单的：<a href="http://www.usadellab.org/cms/?page=trimmomatic">http://www.usadellab.org/cms/?page=trimmomatic</a></div>
<div>Paired End:</div>
<div>java -jar trimmomatic-0.35.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ## 所以只需要把参数放对位置即可！</div>
<div>This will perform the following:</div>
<ul>
<li>Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)</li>
<li>Remove leading low quality or N bases (below quality 3) (LEADING:3)</li>
<li>Remove trailing low quality or N bases (below quality 3) (TRAILING:3)</li>
<li>Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)</li>
<li>Drop reads below the 36 bases long (MINLEN:36)</li>
</ul>
<div>一般就使用这个默认参数就好啦，处理的时间会有一点慢，我取了10个线程也得十几分钟才搞定2G的fq.gz压缩格式的测序文件，文件的log日志如下：</div>
<div>TrimmomaticPE: Started with arguments:</div>
<div>-threads 10 -phred33 -trimlog tmp.log CHG006373_R1.fastq.gz CHG006373_R2.fastq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:/home/jmzeng//biosoft/trimmomatic/Trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 LEADING:10 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:36</div>
<div>Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'</div>
<div>ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences</div>
<div>Input Read Pairs: 21427010 Both Surviving: 14507723 (67.71%) Forward Only Surviving: 5297811 (24.72%) Reverse Only Surviving: 375547 (1.75%) Dropped: 1245929 (5.81%)</div>
<div>TrimmomaticPE: Completed successfully</div>
<div>记住指定接头文件一定要用全路径哦！！！</div>
<div>可以看到它使用了自带的文件TruSeq3-PE.fa里面的接头 TACACTCTTTCCCTACACGACGCTCTTCCGATCT其实只是 TruSeq Universal Adapter (可以在<a href="https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt">https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt</a> 找到接头信息)的后半段，直接在R1测序文件里面搜索可以看到，距离AAAAAAAAAAAAATTTTTTTTTTTTTTTTT这样的字符串和它的 接头 TACACTCTTTCCCTACACGACGCTCTTCCGATCT之间还有序列：</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/0bb1ff2bbe944182ab5b1f12405b0f4a/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="9A514F9A3AB245A4BFE5FD740EB6EAA6" /><img class="alignnone size-full wp-image-1960" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/21.png" alt="2" width="1082" height="460" /></div>
<div></div>
<div>比如我们拿第一个序列举例，可以看到第一条序列被trimmomatic丢到了output_forward_unpaired.fq.gz，它就懒得给它去除接头了，因为右端序列更可怜！</div>
<div>检查文件，发现有的地方是根据质量值来去除的，因为跟接头没有半毛钱关系！</div>
<div><img src="file:///C:/Users/jimmy1314/AppData/Local/YNote/data/jmzeng1314@163.com/b2acfc76979543a0b33a7975e6828920/clipboard.png" alt="" data-media-type="image" data-attr-org-src-id="7FAB978A93294E5B81C2AB4D22DE05C6" /><img class="alignnone size-full wp-image-1961" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/3.png" alt="3" width="1106" height="358" /></div>
<div></div>
<div>因为它是接头和低质量碱基一起去除，我很难探究它到底是如何去除接头的，非常郁闷，但是它对illumina的数据效果非常好！因为去除的百分比很高。</div>
<div></div>
<div></div>
<div></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1958.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用cutadapt软件来对双端测序数据去除接头</title>
		<link>http://www.bio-info-trainee.com/1920.html</link>
		<comments>http://www.bio-info-trainee.com/1920.html#comments</comments>
		<pubDate>Thu, 06 Oct 2016 14:32:25 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[cutadapt]]></category>
		<category><![CDATA[接头]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1920</guid>
		<description><![CDATA[一般来讲，我们对测序数据进行QC，就三个大的方向：Quality trimmin &#8230; <a href="http://www.bio-info-trainee.com/1920.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>一般来讲，我们对测序数据进行QC，就三个大的方向：Quality trimming， Adapter removal， Contaminant filtering，当我们是双端测序数据的时候，去除接头时，也会丢掉太短的reads，就容易导致左右两端测序文件reads数量不平衡，有一个比较好的软件能解决这个问题，我比较喜欢的是cutadapt软件的PE模式来去除接头！尤其是做基因组或者转录组de novo 组装的时候，尤其要去掉接头，去的干干净净！<br />
cutadapt是经典的python软件，但是因为我的linux服务器有点问题 ，可能是root权限问题，没有用pip install cutadapt 安装成功，我懒得搞这些了，其实可以自己去下载cutadapt的源码，然后进入源码文件夹里面 python setup.py install --user 到自己的 ~/.local/bin下面。<br />
所以我用conda安装了cutadapt软件，<a href="http://www.bio-info-trainee.com/1906.html" target="_blank">http://www.bio-info-trainee.com/1906.html</a> 所以我需要 python ~/miniconda2/pkgs/cutadapt-1.10-py27_0/bin/cutadapt --help 才能调用这个软件，不过，问题不大，我也就是试用一下。<span id="more-1920"></span></p>
<p>首先用fastqc软件对测序数据进行简单检测，看看有什么接头：</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/clipboard.png"><img class="alignnone size-full wp-image-1922" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/clipboard.png" alt="clipboard" width="326" height="267" /></a></p>
<div>既然fastqc能探测到你的接头，说明它里面内置了所有的接头序列，在github可以查到：<a href="https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt">https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt</a></div>
<div>或者：Download common Illumina adapters from   <a href="https://github.com/vsbuffalo/scythe/blob/master/illumina_adapters.fa" rel="nofollow">https://github.com/vsbuffalo/scythe/blob/master/illumina_adapters.fa</a></div>
<div><span style="color: #ff0000;">TruSeq Universal Adapter</span> AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT</div>
<div><span style="color: #ff0000;">Illumina Small RNA 3p Adapter 1</span> ATCTCGTATGCCGTCTTCTGCTTG</div>
<div>最严重的一般是TruSeq Universal Adapter ， 而且它检测到你其它的接头可能就是这个 TruSeq Universal Adapter 的一部分而已~！ 我们可以先用cutadapt去除试一下</div>
<div>
<div>cutadapt软件支持对PE 测序数据的处理，基本的用法是：</div>
<div>cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq</div>
<div>-a和-A是左右端测序数据的3端接头，-g和-G是左右端测序数据的5端接头。</div>
<div>支持fastq和fasta格式的gz压缩文件，必要时用-f参数指定测序文件数据格式即可。</div>
<div>
<div>我自己的实际例子如下：</div>
<blockquote>
<div>python ~/miniconda2/pkgs/cutadapt-1.10-py27_0/bin/cutadapt \</div>
<div>-a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -a ATCTCGTATGCCGTCTTCTGCTTG \</div>
<div>-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -A CAAGCAGAAGACGGCATACGAGAT \</div>
<div>-e 0.1 -O 5 -m 50  -n  2 --pair-filter=both \</div>
<div>-o read_PE1.fq -p read_PE2.fq test1.fastq test2.fastq &gt;&amp; log.txt</div>
</blockquote>
<div>参数解释如下：</div>
<div>两个-a 参数后面接的是两种接头，两个-A参加后面接的是同样的两个接头的反向互补序列！</div>
<div>加上 -n 2 是因为有两个接头，我的测序数据里面有可能一个read里面含有两种接头，所以需要检测两次。</div>
<div>-e 0.1 -O 5 -m 50 是标准参数，自己看readme就好了，其中-m 设置为50 是表示去除接头后如果read长度小于50我就不要了，因为我是PE150测序的，这也就是为什么要用PE模式来去除接头，保证过滤后的reads还是数量继续平衡的。</div>
<div></div>
<div>去除了接头之后的文件再用fastqc软件跑一下，很明显可以看到 接头去除成功啦！</div>
<div></div>
<div>当然，这个软件还有很多其它的功能，我就不一一介绍了： 参考：<a href="http://cutadapt.readthedocs.io/en/stable/guide.html">http://cutadapt.readthedocs.io/en/stable/guide.html</a></div>
<div></div>
<div>帮助文档还是蛮长的：</div>
<div>
<div>cutadapt version 1.9.1</div>
<div>Copyright (C) 2010-2015 Marcel Martin &lt;<a href="mailto:marcel.martin@scilifelab.se">marcel.martin@scilifelab.se</a>&gt;</div>
<div></div>
<div>cutadapt removes adapter sequences from high-throughput sequencing reads.</div>
<div></div>
<div>Usage:</div>
<div>    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq</div>
<div></div>
<div>For paired-end reads:</div>
<div>    cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq</div>
<div></div>
<div>Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard</div>
<div>characters are supported. The reverse complement is *not* automatically</div>
<div>searched. All reads from input.fastq will be written to output.fastq with the</div>
<div>adapter sequence removed. Adapter matching is error-tolerant. Multiple adapter</div>
<div>sequences can be given (use further -a options), but only the best-matching</div>
<div>adapter will be removed.</div>
<div></div>
<div>Input may also be in FASTA format. Compressed input and output is supported and</div>
<div>auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for</div>
<div>standard input/output. Without the -o option, output is sent to standard output.</div>
<div></div>
<div>Citation:</div>
<div></div>
<div>Marcel Martin. Cutadapt removes adapter sequences from high-throughput</div>
<div>sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011.</div>
<div><a href="http://dx.doi.org/10.14806/ej.17.1.200">http://dx.doi.org/10.14806/ej.17.1.200</a></div>
<div></div>
<div>Use "cutadapt --help" to see all command-line options.</div>
<div>See <a href="http://cutadapt.readthedocs.org/">http://cutadapt.readthedocs.org/</a> for full documentation.</div>
<div></div>
<div>Options:</div>
<div>  --version             show program's version number and exit</div>
<div>  -h, --help            show this help message and exit</div>
<div>  --debug               Print debugging information.</div>
<div>  -f FORMAT, --format=FORMAT</div>
<div>                        Input file format; can be either 'fasta', 'fastq' or</div>
<div>                        'sra-fastq'. Ignored when reading csfasta/qual files</div>
<div>                        (default: auto-detect from file name extension).</div>
<div></div>
<div>  Options that influence how the adapters are found:</div>
<div>    Each of the three parameters -a, -b, -g can be used multiple times and</div>
<div>    in any combination to search for an entire set of adapters of possibly</div>
<div>    different types. Only the best matching adapter is trimmed from each</div>
<div>    read (but see the --times option). Instead of giving an adapter</div>
<div>    directly, you can also write file:FILE and the adapter sequences will</div>
<div>    be read from the given FASTA FILE.</div>
<div></div>
<div>    -a ADAPTER, --adapter=ADAPTER</div>
<div>                        Sequence of an adapter that was ligated to the 3' end.</div>
<div>                        The adapter itself and anything that follows is</div>
<div>                        trimmed. If the adapter sequence ends with the '$'</div>
<div>                        character, the adapter is anchored to the end of the</div>
<div>                        read and only found if it is a suffix of the read.</div>
<div>    -g ADAPTER, --front=ADAPTER</div>
<div>                        Sequence of an adapter that was ligated to the 5' end.</div>
<div>                        If the adapter sequence starts with the character '^',</div>
<div>                        the adapter is 'anchored'. An anchored adapter must</div>
<div>                        appear in its entirety at the 5' end of the read (it</div>
<div>                        is a prefix of the read). A non-anchored adapter may</div>
<div>                        appear partially at the 5' end, or it may occur within</div>
<div>                        the read. If it is found within a read, the sequence</div>
<div>                        preceding the adapter is also trimmed. In all cases,</div>
<div>                        the adapter itself is trimmed.</div>
<div>    -b ADAPTER, --anywhere=ADAPTER</div>
<div>                        Sequence of an adapter that was ligated to the 5' or</div>
<div>                        3' end. If the adapter is found within the read or</div>
<div>                        overlapping the 3' end of the read, the behavior is</div>
<div>                        the same as for the -a option. If the adapter overlaps</div>
<div>                        the 5' end (beginning of the read), the initial</div>
<div>                        portion of the read matching the adapter is trimmed,</div>
<div>                        but anything that follows is kept.</div>
<div>    -e ERROR_RATE, --error-rate=ERROR_RATE</div>
<div>                        Maximum allowed error rate (no. of errors divided by</div>
<div>                        the length of the matching region) (default: 0.1)</div>
<div>    --no-indels         Do not allow indels in the alignments (allow only</div>
<div>                        mismatches). (default: allow both mismatches and</div>
<div>                        indels)</div>
<div>    -n COUNT, --times=COUNT</div>
<div>                        Remove up to COUNT adapters from each read (default:</div>
<div>                        1)</div>
<div>    -O LENGTH, --overlap=LENGTH</div>
<div>                        Minimum overlap length. If the overlap between the</div>
<div>                        read and the adapter is shorter than LENGTH, the read</div>
<div>                        is not modified. This reduces the no. of bases trimmed</div>
<div>                        purely due to short random adapter matches (default:</div>
<div>                        3).</div>
<div>    --match-read-wildcards</div>
<div>                        Allow IUPAC wildcards in reads (default: False).</div>
<div>    -N, --no-match-adapter-wildcards</div>
<div>                        Do not interpret IUPAC wildcards in adapters.</div>
<div>    --no-trim           Match and redirect reads to output/untrimmed-output as</div>
<div>                        usual, but do not remove adapters.</div>
<div>    --mask-adapter      Mask adapters with 'N' characters instead of trimming</div>
<div>                        them.</div>
<div></div>
<div>  Additional read modifications:</div>
<div>    -u LENGTH, --cut=LENGTH</div>
<div>                        Remove LENGTH bases from the beginning or end of each</div>
<div>                        read. If LENGTH is positive, bases are removed from</div>
<div>                        the beginning of each read. If LENGTH is negative,</div>
<div>                        bases are removed from the end of each read. This</div>
<div>                        option can be specified twice if the LENGTHs have</div>
<div>                        different signs.</div>
<div>    -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff=[5'CUTOFF,]3'CUTOFF</div>
<div>                        Trim low-quality bases from 5' and/or 3' ends of reads</div>
<div>                        before adapter removal. If one value is given, only</div>
<div>                        the 3' end is trimmed. If two comma-separated cutoffs</div>
<div>                        are given, the 5' end is trimmed with the first</div>
<div>                        cutoff, the 3' end with the second. See documentation</div>
<div>                        for the algorithm. (default: no trimming)</div>
<div>    --quality-base=QUALITY_BASE</div>
<div>                        Assume that quality values in FASTQ are encoded as</div>
<div>                        ascii(quality + QUALITY_BASE). This needs to be set to</div>
<div>                        64 for some old Illumina FASTQ files. Default: 33</div>
<div>    --trim-n            Trim N's on ends of reads.</div>
<div>    -x PREFIX, --prefix=PREFIX</div>
<div>                        Add this prefix to read names. Use {name} to insert</div>
<div>                        the name of the matching adapter.</div>
<div>    -y SUFFIX, --suffix=SUFFIX</div>
<div>                        Add this suffix to read names; can also include {name}</div>
<div>    --strip-suffix=STRIP_SUFFIX</div>
<div>                        Remove this suffix from read names if present. Can be</div>
<div>                        given multiple times.</div>
<div>    --length-tag=TAG    Search for TAG followed by a decimal number in the</div>
<div>                        description field of the read. Replace the decimal</div>
<div>                        number with the correct length of the trimmed read.</div>
<div>                        For example, use --length-tag 'length=' to correct</div>
<div>                        fields like 'length=123'.</div>
<div></div>
<div>  Options for filtering of processed reads:</div>
<div>    --discard-trimmed, --discard</div>
<div>                        Discard reads that contain an adapter. Also use -O to</div>
<div>                        avoid discarding too many randomly matching reads!</div>
<div>    --discard-untrimmed, --trimmed-only</div>
<div>                        Discard reads that do not contain the adapter.</div>
<div>    -m LENGTH, --minimum-length=LENGTH</div>
<div>                        Discard trimmed reads that are shorter than LENGTH.</div>
<div>                        Reads that are too short even before adapter removal</div>
<div>                        are also discarded. In colorspace, an initial primer</div>
<div>                        is not counted (default: 0).</div>
<div>    -M LENGTH, --maximum-length=LENGTH</div>
<div>                        Discard trimmed reads that are longer than LENGTH.</div>
<div>                        Reads that are too long even before adapter removal</div>
<div>                        are also discarded. In colorspace, an initial primer</div>
<div>                        is not counted (default: no limit).</div>
<div>    --max-n=COUNT       Discard reads with too many N bases. If COUNT is an</div>
<div>                        integer, it is treated as the absolute number of N</div>
<div>                        bases. If it is between 0 and 1, it is treated as the</div>
<div>                        proportion of N's allowed in a read.</div>
<div></div>
<div>  Options that influence what gets output to where:</div>
<div>    --quiet             Do not print a report at the end.</div>
<div>    -o FILE, --output=FILE</div>
<div>                        Write modified reads to FILE. FASTQ or FASTA format is</div>
<div>                        chosen depending on input. The summary report is sent</div>
<div>                        to standard output. Use '{name}' in FILE to</div>
<div>                        demultiplex reads into multiple files. (default:</div>
<div>                        trimmed reads are written to standard output)</div>
<div>    --info-file=FILE    Write information about each read and its adapter</div>
<div>                        matches into FILE. See the documentation for the file</div>
<div>                        format.</div>
<div>    -r FILE, --rest-file=FILE</div>
<div>                        When the adapter matches in the middle of a read,</div>
<div>                        write the rest (after the adapter) into FILE.</div>
<div>    --wildcard-file=FILE</div>
<div>                        When the adapter has N bases (wildcards), write</div>
<div>                        adapter bases matching wildcard positions to FILE.</div>
<div>                        When there are indels in the alignment, this will</div>
<div>                        often not be accurate.</div>
<div>    --too-short-output=FILE</div>
<div>                        Write reads that are too short (according to length</div>
<div>                        specified by -m) to FILE. (default: discard reads)</div>
<div>    --too-long-output=FILE</div>
<div>                        Write reads that are too long (according to length</div>
<div>                        specified by -M) to FILE. (default: discard reads)</div>
<div>    --untrimmed-output=FILE</div>
<div>                        Write reads that do not contain the adapter to FILE.</div>
<div>                        (default: output to same file as trimmed reads)</div>
<div></div>
<div>  Colorspace options:</div>
<div>    -c, --colorspace    Enable colorspace mode: Also trim the color that is</div>
<div>                        adjacent to the found adapter.</div>
<div>    -d, --double-encode</div>
<div>                        Double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).</div>
<div>    -t, --trim-primer   Trim primer base and the first color (which is the</div>
<div>                        transition to the first nucleotide)</div>
<div>    --strip-f3          Strip the _F3 suffix of read names</div>
<div>    --maq, --bwa        MAQ- and BWA-compatible colorspace output. This</div>
<div>                        enables -c, -d, -t, --strip-f3 and -y '/1'.</div>
<div>    --no-zero-cap       Do not change negative quality values to zero in</div>
<div>                        colorspace data. By default, they are changed to zero</div>
<div>                        since many tools have problems with negative</div>
<div>                        qualities.</div>
<div>    -z, --zero-cap      Change negative quality values to zero. This is</div>
<div>                        enabled by default when -c/--colorspace is also</div>
<div>                        enabled. Use the above option to disable it.</div>
<div></div>
<div>  Paired-end options:</div>
<div>    The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts.</div>
<div></div>
<div>    -A ADAPTER          3' adapter to be removed from second read in a pair.</div>
<div>    -G ADAPTER          5' adapter to be removed from second read in a pair.</div>
<div>    -B ADAPTER          5'/3 adapter to be removed from second read in a pair.</div>
<div>    -U LENGTH           Remove LENGTH bases from the beginning or end of each</div>
<div>                        second read (see --cut).</div>
<div>    -p FILE, --paired-output=FILE</div>
<div>                        Write second read in a pair to FILE.</div>
<div>    --pair-filter=(any|both)</div>
<div>                        Which of the reads in a paired-end read have to match</div>
<div>                        the filtering criterion in order for it to be</div>
<div>                        filtered. Default: any.</div>
<div>    --interleaved       Read and write interleaved paired-end reads.</div>
<div>    --untrimmed-paired-output=FILE</div>
<div>                        Write second read in a pair to this FILE when no</div>
<div>                        adapter was found in the first read. Use this option</div>
<div>                        together with --untrimmed-output when trimming paired-</div>
<div>                        end reads. (Default: output to same file as trimmed</div>
<div>                        reads.)</div>
<div>    --too-short-paired-output=FILE</div>
<div>                        Write second read in a pair to this file if pair is</div>
<div>                        too short. Use together with --too-short-output.</div>
<div>    --too-long-paired-output=FILE</div>
<div>                        Write second read in a pair to this file if pair is</div>
<div>                        too long. Use together with --too-long-output.</div>
</div>
</div>
</div>
<p>&nbsp;</p>
<div></div>
<div>I think assembly is one of the things were you definitely want to remove adapters, if not you'll have spurious overlaps creating erroneous contigs/transcripts, just because they share an adapter.</div>
<div>去接头对基因组或者转录组组装是非常重要的：</div>
<div><a href="http://www.opiniomics.org/we-need-to-stop-making-this-simple-fcking-mistake/">http://www.opiniomics.org/we-need-to-stop-making-this-simple-fcking-mistake/</a></div>
<div><a href="http://grahametherington.blogspot.jp/2014/09/why-you-should-qc-your-reads-and-your.html">http://grahametherington.blogspot.jp/2014/09/why-you-should-qc-your-reads-and-your.html</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1920.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
