<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; MOSAIK</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/mosaik/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>新的比对工具MOSAIK</title>
		<link>http://www.bio-info-trainee.com/1457.html</link>
		<comments>http://www.bio-info-trainee.com/1457.html#comments</comments>
		<pubDate>Tue, 15 Mar 2016 10:55:20 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基础软件]]></category>
		<category><![CDATA[alignment]]></category>
		<category><![CDATA[MOSAIK]]></category>
		<category><![CDATA[sam]]></category>
		<category><![CDATA[比对]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1457</guid>
		<description><![CDATA[功能：序列比对，类似于BWA，Bowtie 优点：全平台，甚至支持pacbio的 &#8230; <a href="http://www.bio-info-trainee.com/1457.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>功能：序列比对，类似于BWA，Bowtie</div>
<div>优点：全平台，甚至支持pacbio的三代测序长reads</div>
<div>算法：是hash index，跟其它bwt算法不太一样</div>
<div>官网：<a href="https://github.com/wanpinglee/MOSAIK" target="_blank">https://github.com/wanpinglee/MOSAIK</a></div>
<div>paper：<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090581">http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090581</a></div>
<div>
<div>
<div>作者：WP Lee - ‎2014 - ‎<a href="https://scholar.google.com/scholar?um=1&amp;ie=UTF-8&amp;lr&amp;cites=8963892741176779202">被引用次数：70</a> - ‎<a href="https://scholar.google.com/scholar?um=1&amp;ie=UTF-8&amp;lr&amp;q=related:wmGLkvQkZnzaqM:scholar.google.com/">相关文章</a></div>
</div>
</div>
<p><span id="more-1457"></span></p>
<div>
<pre>Overview:

MOSAIK is a stable, sensitive and open-source program for mapping second and 
third-generation sequencing reads to a reference genome. Uniquely among current 
mapping tools, MOSAIK can align reads generated by all the major sequencing 
technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, 
Ion Torrent and Pacific BioSciences SMRT.</pre>
</div>
<h1><span style="color: #ff0000;">一，软件安装</span></h1>
<div>
<div>软件下载地址：<a href="https://github.com/wanpinglee/MOSAIK/archive/master.zip">https://github.com/wanpinglee/MOSAIK/archive/master.zip</a></div>
</div>
<div>下载压缩包，解压后进入src源码目录，然后make即可！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/11.png"><img class="alignnone size-full wp-image-1458" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/11.png" alt="1" width="389" height="153" /></a></div>
<div>这些程序就可以用啦！</div>
<div>里面有四个软件，所以需要四个步骤来完成比对！</div>
<div>build和jump是对参考基因组建立索引</div>
<div>build同时需要对测序数据进行索引</div>
<div>aligner是把两个索引进行比对！</div>
<div>text是把比对的结果转为其它可读格式，通常是sam比对格式</div>
<h1><span style="color: #ff0000;">二，输入数据准备</span></h1>
<div>比对当然需要测序的fastq格式reads和fa格式的参考基因组啦！</div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/21.png"><img class="alignnone size-full wp-image-1459" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/03/21.png" alt="2" width="554" height="202" /></a></div>
<div>我是下载的<a href="http://odin.mdacc.tmc.edu/~xsu1/VirusSeq.html" target="_blank">http://odin.mdacc.tmc.edu/~xsu1/VirusSeq.html</a>  里面的数据，因为之所以要用这个软件，也是因为找人体内病毒整合的需求！</div>
<div>PE测序的reads，参考基因组是病毒和人类</div>
<h1><span style="color: #ff0000;">三，运行命令</span></h1>
<div>下面是一个完整的脚本</div>
<div><span style="color: #4f81bd; font-size: medium;"><b>首先对参考基因组构建索引</b></span></div>
<div>
<blockquote>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">Mosaik_bin=~/bio-soft/MOSAIK/bin  #设置好程序安装目录</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">##for gib virus reference genome</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">$Mosaik_bin/MosaikBuild -fr gibVirus.fa -oa gibVirus.fa.bin -st illumina -assignQual 40</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">$Mosaik_bin/MosaikJump -ia gibVirus.fa.bin -out gibVirus.JumpDb -hs 15</span></div>
</blockquote>
</div>
<blockquote>
<div>这两个步骤是构建hash索引，对这个60M的压缩包病毒基因组集合，时间是</div>
<div>
<div>MosaikBuild CPU time: 15.660 s, wall time: 18.146 s</div>
</div>
<div>
<div>MosaikJump CPU time: 329.031 s, wall time: 331.672 s</div>
<div>还可以接受，但是输出的index文件就有点难以接受了！！！！</div>
</div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">333M Mar 11 19:55 gibVirus.fa.bin</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">60M Aug 13  2013 gibVirus.fa.gz</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">5.0G Mar 11 20:04 gibVirus.JumpDb_keys.jmp</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">1 Mar 11 19:59 gibVirus.JumpDb_meta.jmp</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">1.3G Mar 11 20:04 gibVirus.JumpDb_positions.jmp</span></div>
<div>如果是对人的hg19基因组来说，消耗的时间如下：</div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">MosaikBuild CPU time: 183.642 s, wall time: 184.658 s</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">MosaikJump CPU time: 3985.608 s, wall time: 3995.323 s</span></div>
<div><span style="font-family: Monaco,Consolas,Courier,Lucida Console,monospace;">一个多小时，还行！</span></div>
</blockquote>
<p><span style="color: #4f81bd; font-size: medium;"><b>对参考基因组建好了索引，还需要对测序数据构建索引！</b></span></p>
<div>
<blockquote>
<div>$Mosaik_bin/MosaikBuild  -q L526401A_1.fq.gz -q2 L526401A_2.fq.gz -out L526401A.bin -st illumina</div>
</blockquote>
</div>
<blockquote>
<div>数据双端测序，每个1.6G左右数据，构建索引耗时如下：</div>
</blockquote>
<div>
<blockquote>
<div># reads written:          53060622</div>
<div># bases written:        5304891143</div>
<div></div>
<div>MosaikBuild CPU time: 388.969 s, wall time: 391.149 s</div>
</blockquote>
</div>
<p><span style="color: #4f81bd; font-size: medium;"><b>接下来就比对！</b></span></p>
<div>
<blockquote>
<div>ANN_PATH=~/bio-soft/MOSAIK/src/networkFile</div>
<div>$Mosaik_bin/MosaikAligner -in L526401A.bin  \</div>
<div>-out L526401A.bin.aligned \</div>
<div>-ia ../Mosaik_JumpDb/hg19Virus.fa.bin \</div>
<div>-j ../Mosaik_JumpDb/hg19Virus.JumpDb \</div>
<div>-annpe $ANN_PATH/2.1.26.pe.100.0065.ann -annse $ANN_PATH/2.1.26.se.100.005.ann</div>
</blockquote>
</div>
<p><span style="color: #4f81bd; font-size: medium;"><b>比对的结果就是那个L526401A.bin.aligned，但是还需要用MosaikText转换成sam格式方便阅读！</b></span></p>
<div>
<blockquote>
<div>$Mosaik_bin/MosaikText -in<span class="Apple-converted-space"> </span>L526401A.bin.aligned  -sam L526401A.bin.aligned.sam -u</div>
</blockquote>
</div>
<blockquote>
<div>其实它github里面有测试数据，你跑一遍就懂了！</div>
<div></div>
</blockquote>
<h1><span style="color: #ff0000;">四，数据结果解读</span></h1>
<div>都是sam格式了就不比解释了</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1457.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
