<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 双端测序</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e5%8f%8c%e7%ab%af%e6%b5%8b%e5%ba%8f/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>用sickle软件来对双端测序数据过滤低质量reads</title>
		<link>http://www.bio-info-trainee.com/1914.html</link>
		<comments>http://www.bio-info-trainee.com/1914.html#comments</comments>
		<pubDate>Thu, 06 Oct 2016 13:47:26 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[QC]]></category>
		<category><![CDATA[sickle]]></category>
		<category><![CDATA[双端测序]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1914</guid>
		<description><![CDATA[一般来讲，我们对测序数据进行QC，就三个大的方向：Quality trimmin &#8230; <a href="http://www.bio-info-trainee.com/1914.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>一般来讲，我们对测序数据进行QC，就三个大的方向：Quality trimming， Adapter removal， Contaminant filtering，当我们是双端测序数据的时候，去除低质量的reads就容易导致左右两端测序文件不平衡，有一个比较好的软件能解决这个问题，而且软件使用非常简单！<span id="more-1914"></span></p>
<p>安装代码如下：</p>
<blockquote><p>## https://github.com/najoshi/sickle<br />
cd ~/biosoft<br />
mkdir sickle &amp;&amp; cd sickle<br />
wget https://codeload.github.com/najoshi/sickle/zip/master -O sickle.zip<br />
unzip sickle.zip<br />
cd sickle-master<br />
make<br />
~/biosoft/sickle/sickle-master/sickle -h</p></blockquote>
<p>这个软件很简单，就是去除低质量的reads，而且还可以保证双端测序的完整性。</p>
<p>本实例的测试数据可以在 <a href="http://www.biotrainee.com/jmzeng/reads/test1.fastq " target="_blank">http://www.biotrainee.com/jmzeng/reads/test1.fastq </a>  和 <a href="http://www.biotrainee.com/jmzeng/reads/test2.fastq " target="_blank">http://www.biotrainee.com/jmzeng/reads/test2.fastq </a></p>
<p>这个软件支持gz压缩格式，我应该压缩好了再上传到我们的云服务器的，这样可以节省流量，这只是一个测试，如果数据传输压力太大了，我们可能会取消链接，改为百度云分享！<br />
~/biosoft/sickle/sickle-master/sickle pe -f test1.fastq -r test2.fastq -t sanger -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq</p>
<p>软件给出的log日志如下：</p>
<blockquote><p>PE forward file: test1.fastq<br />
PE reverse file: test2.fastq</p>
<p>Total input FastQ records: 200000 (100000 pairs)</p>
<p>FastQ paired records kept: 192262 (96131 pairs)<br />
FastQ single records kept: 3869 (from PE1: 3864, from PE2: 5)<br />
FastQ paired records discarded: 0 (0 pairs)<br />
FastQ single records discarded: 3869 (from PE1: 5, from PE2: 3864)</p></blockquote>
<p>然后批量查看处理前后的fastqc质量报告：<br />
ls *fastq |xargs -P 5 ~/biosoft/fastqc/FastQC/fastqc</p>
<p>比较所有的fastq文件的结果报告就可以看出它做了什么！</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/QQ截图20161006214646.png"><img class="alignnone size-full wp-image-1915" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/10/QQ截图20161006214646.png" alt="qq%e6%88%aa%e5%9b%be20161006214646" width="377" height="230" /></a></p>
<p>&nbsp;</p>
<p>sickle处理前后的结果文件都可以在 <a href="http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip" target="_blank">http://www.biotrainee.com/jmzeng/sickle/sickle-results.zip</a> 下载。</p>
<p>&nbsp;</p>
<p>当然，这只是默认参数的用法，还可以添加很多参数! 比如   -q 30 -l 15 参数解释如下：</p>
<ul>
<li>
<blockquote><p>pe : use paired-end mode</p></blockquote>
</li>
<li>
<blockquote><p>-f training/rnaseq/ERR022486_chr22_read1.fastq.gz : the fastq file for read 1</p></blockquote>
</li>
<li>
<blockquote><p>-r training/rnaseq/ERR022486_chr22_read2.fastq.gz : the fastq file for read 2</p></blockquote>
</li>
<li>
<blockquote><p>-t sanger : the quality encoding.  All data downloaded from EBI or NCBI will be "sanger" encoded.  For an explanation:<a href="http://en.wikipedia.org/wiki/FASTQ_format#Encoding">http://en.wikipedia.org/wiki/FASTQ_format#Encoding</a></p></blockquote>
</li>
<li>
<blockquote><p>-o ERR022486_chr22_read1_trim.fastq : the output file for trimmed reads from read 1</p></blockquote>
</li>
<li>
<blockquote><p>-p ERR022486_chr22_read2_trim.fastq : the output file for trimmed reads from read 2</p></blockquote>
</li>
<li>
<blockquote><p>-s ERR022486_chr22_single_trim.fastq : the output file for reads where the mate has failed the quality or length filter</p></blockquote>
</li>
<li>
<blockquote><p>-q 30 : the quality value to use.  Bases below this will be trimmed, using a sliding window</p></blockquote>
</li>
<li>
<blockquote><p>-l 15 : the minimum length allowed after trimming.  Here we remove reads with less than 15bp</p></blockquote>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1914.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
