<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 重复序列屏蔽</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e9%87%8d%e5%a4%8d%e5%ba%8f%e5%88%97%e5%b1%8f%e8%94%bd/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>重复序列屏蔽第一讲RepeatMasker的一些参数调试</title>
		<link>http://www.bio-info-trainee.com/589.html</link>
		<comments>http://www.bio-info-trainee.com/589.html#comments</comments>
		<pubDate>Wed, 01 Apr 2015 13:52:39 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[基因组学]]></category>
		<category><![CDATA[repeatmasker]]></category>
		<category><![CDATA[重复序列屏蔽]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=589</guid>
		<description><![CDATA[这是很久以前的一篇文章，我先贴出来给大家看看，然后讲一个实例 一：RepeatM &#8230; <a href="http://www.bio-info-trainee.com/589.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>这是很久以前的一篇文章，我先贴出来给大家看看，然后讲一个实例</p>
<p>一：RepeatMasker的一些参数运行结果比较</p>
<p>从ncbi随便下载的zebrafish的一条sequence.fasta</p>
<p>不加上任何参数跑出来结果是 RepeatMasker   sequence.fasta</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索138.png"><img class="alignnone size-full wp-image-590" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索138.png" alt="repeat-masker参数摸索138" width="505" height="598" /></a></p>
<p>加上物种的参数之后跑出来是： RepeatMasker -species Danio  sequence.fasta</p>
<p>效果里面出来了，之前得到的重复序列不到10%，这次可以达到70%以上，所以必须得选好对应的物种，这样才不会错过那么多要找的重复序列</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索267.png"><img class="alignnone size-full wp-image-591" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索267.png" alt="repeat-masker参数摸索267" width="536" height="525" /></a></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索269.png"><img class="alignnone size-full wp-image-592" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索269.png" alt="repeat-masker参数摸索269" width="554" height="331" /></a></p>
<p>再加上-low这个参数是 RepeatMasker -species Danio -low  sequence.fasta</p>
<p>感觉没有改变多少，就少了几个</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索349.png"><img class="alignnone size-full wp-image-593" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索349.png" alt="repeat-masker参数摸索349" width="551" height="526" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索351.png"><img class="alignnone size-full wp-image-594" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索351.png" alt="repeat-masker参数摸索351" width="533" height="332" /></a></p>
<p>比较-div参数：RepeatMasker -species Danio  sequence.fasta</p>
<p>RepeatMasker -species Danio -div 10  sequence.fasta</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索459.png"><img class="alignnone size-full wp-image-595" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索459.png" alt="repeat-masker参数摸索459" width="554" height="197" /></a></p>
<p>而加上-div 10之后</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索475.png"><img class="alignnone size-full wp-image-596" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索475.png" alt="repeat-masker参数摸索475" width="553" height="218" /></a></p>
<p>第二列小于10%的全部被剔除掉了</p>
<p>输出参数，本来应该是用N把重复区域屏蔽掉的</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索518.png"><img class="alignnone size-full wp-image-597" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索518.png" alt="repeat-masker参数摸索518" width="505" height="436" /></a></p>
<p>但是如果加上参数-x，原来输出是N的地方就都变成了X，感觉这个参数没啥子意义。</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索560.png"><img class="alignnone size-full wp-image-598" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索560.png" alt="repeat-masker参数摸索560" width="498" height="287" /></a></p>
<p>还有一些类似的参数，意义也不大，加上-xsmall，就是把重复区域用小写字母，不再需要N来掩盖了</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索613.png"><img class="alignnone size-full wp-image-599" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索613.png" alt="repeat-masker参数摸索613" width="507" height="403" /></a></p>
<p>如果加上-a这个参数，就多了一个文件</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索637.png"><img class="alignnone size-full wp-image-600" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索637.png" alt="repeat-masker参数摸索637" width="230" height="76" /></a></p>
<p>查看可知其内容是</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索648.png"><img class="alignnone size-full wp-image-601" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索648.png" alt="repeat-masker参数摸索648" width="553" height="348" /></a></p>
<p>The  alignments are in the cross_match/SWAT format, in which mismatches rather than matches are indicated: transitions</p>
<p>with an i and  transversions with a v. Note it exists some differences between the  alignment file and the map fi le.</p>
<p>The map fi le is produced by  ProcessRepeats that the main task is to defragment the original  map file and the alignment fi le is created from the original map fi le:  the difference between them comes from the defragmented hits.<br />
如果加上-poly，也会多出一个文件</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1139.png"><img class="alignnone size-full wp-image-602" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1139.png" alt="repeat-masker参数摸索1139" width="232" height="40" /></a></p>
<p>查看，可知其单独列出了微卫星的表格</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1159.png"><img class="alignnone size-full wp-image-603" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1159.png" alt="repeat-masker参数摸索1159" width="554" height="137" /></a></p>
<p>The ‘-xm’, ‘-ace,’ and ‘-gff ’ options create an additional out put file in cross match, ACeDB, and Gene Feature Finding format  respectively.这几个参数都是为了生成适合其它处理的文件。</p>
<p>另外针对大文件的操作，可能需要-pa来设置运行速度，或者-s，-q，-qq</p>
<p>&nbsp;</p>
<p>二：生成的文件的解释</p>
<p>会输出这些文件</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1387.png"><img class="alignnone size-full wp-image-604" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1387.png" alt="repeat-masker参数摸索1387" width="553" height="26" /></a></p>
<p>1，。Out类文件</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1399.png"><img class="alignnone size-full wp-image-605" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1399.png" alt="repeat-masker参数摸索1399" width="554" height="201" /></a></p>
<table>
<tbody>
<tr>
<td width="142">SW score</td>
<td width="248">根据Smith-Waterman算法比对的分值</td>
<td width="161">2555</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Div%</td>
<td width="248">比上区间与共有序列相比的替代率</td>
<td width="161">5.7</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Del%</td>
<td width="248">在查询序列中碱基缺失的百分率(删除碱基)</td>
<td width="161">0.0</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Ins%</td>
<td width="248">在repeat库序列中碱基缺失的百分率(插入碱基)</td>
<td width="161"> 0.0</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Query sequence</td>
<td width="248">输入的待屏蔽重复的序列</td>
<td width="161">gi|211853417|emb|CU633477.14|</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Position begin</td>
<td width="248"></td>
<td width="161">373</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Position end</td>
<td width="248"></td>
<td width="161"> 690</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Query left</td>
<td width="248">在查询序列中超出比上区域的碱基数</p>
<p>+= 比上了库中重复序列的正义链，如果是互补连用“c”表示</td>
<td width="161">(50140)</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Matching repeat</td>
<td width="248">比上的重复序列的名称</td>
<td width="161">C DNA13TA1a_DR</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Repeat family(class)</td>
<td width="248">比上的重复序列的类型</td>
<td width="161">  DNA/TcMar-Tc1</td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Position begin</td>
<td width="248"></td>
<td width="161"></td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Position end</td>
<td width="248"></td>
<td width="161"></td>
<td width="16"></td>
</tr>
<tr>
<td width="142">Query left</td>
<td width="248">比对区域距重复序列左端的碱基数</td>
<td width="161"></td>
<td width="16"></td>
</tr>
<tr>
<td width="142"></td>
<td width="248">比对的顺序ID</td>
<td width="161"></td>
<td width="16"></td>
</tr>
</tbody>
</table>
<p>3.cat文件基本类似于。Out文件<br />
3。。Tbl类文件</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1917.png"><img class="alignnone size-full wp-image-606" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1917.png" alt="repeat-masker参数摸索1917" width="550" height="440" /></a> <a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1919.png"><img class="alignnone size-full wp-image-607" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索1919.png" alt="repeat-masker参数摸索1919" width="554" height="395" /></a><br />
4.masked文件，就是找到的重复序列被N给代替了，或者用参数改变代替形式</p>
<p>polyout文件。就是单独列出了微卫星表格</p>
<p>Align文件，其实就是把之前的。Out文件的每一行记录单独拿出来再进行表格化解释</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索2027.png"><img class="alignnone size-full wp-image-608" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/04/repeat-masker参数摸索2027.png" alt="repeat-masker参数摸索2027" width="554" height="360" /></a></p>
<p>把373到690的核苷酸序列列出来，说明这个DNA13TA1a_DR 重复具体的意义</p>
<p>但是没看懂这个i，v是什么意思</p>
<p>&nbsp;</p>
<p>结果比较</p>
<p>从ncbi随便下载的zebrafish的一条sequence.fasta</p>
<p>不加上任何参数跑出来结果是 RepeatMasker   sequence.fasta</p>
<p>&nbsp;</p>
<p>加上物种的参数之后跑出来是： RepeatMasker -species Danio  sequence.fasta</p>
<p>效果里面出来了，之前得到的重复序列不到10%，这次可以达到70%以上，所以必须得选好对应的物种，这样才不会错过那么多要找的重复序列</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/589.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
