<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; 性染色体同源</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/%e6%80%a7%e6%9f%93%e8%89%b2%e4%bd%93%e5%90%8c%e6%ba%90/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>模拟Y染色体测序判断，并比对到X染色体上面，看同源性</title>
		<link>http://www.bio-info-trainee.com/1081.html</link>
		<comments>http://www.bio-info-trainee.com/1081.html#comments</comments>
		<pubDate>Wed, 28 Oct 2015 02:34:35 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[未分类]]></category>
		<category><![CDATA[性染色体同源]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1081</guid>
		<description><![CDATA[首先下载两条染色体序列 wget http://hgdownload.cse.u &#8230; <a href="http://www.bio-info-trainee.com/1081.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><strong>首先下载两条染色体序列</strong></p>
<p>wget <a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrX.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrX.fa.gz</a>;</p>
<p>wget <a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrY.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrY.fa.gz</a>;</p>
<p>152M Mar 21  2009 chrX.fa</p>
<p>58M Mar 21  2009 chrY.fa</p>
<p><strong>然后把</strong><strong>X</strong><strong>染色体构建</strong><strong>bwa</strong><strong>的索引</strong></p>
<p><strong>bwa index chrX.fa </strong></p>
<p>[bwa_index] Pack FASTA... 1.97 sec</p>
<p>[bwa_index] Construct BWT for the packed sequence...</p>
<p>[BWTIncCreate] textLength=310541120, availableWord=33850812</p>
<p>[BWTIncConstructFromPacked] 10 iterations done. 55838672 characters processed.</p>
<p>[BWTIncConstructFromPacked] 20 iterations done. 103157920 characters processed.</p>
<p>[BWTIncConstructFromPacked] 30 iterations done. 145211344 characters processed.</p>
<p>[BWTIncConstructFromPacked] 40 iterations done. 182584528 characters processed.</p>
<p>[BWTIncConstructFromPacked] 50 iterations done. 215797872 characters processed.</p>
<p>[BWTIncConstructFromPacked] 60 iterations done. 245313968 characters processed.</p>
<p>[BWTIncConstructFromPacked] 70 iterations done. 271543920 characters processed.</p>
<p>[BWTIncConstructFromPacked] 80 iterations done. 294853104 characters processed.</p>
<p>[bwt_gen] Finished constructing BWT in 88 iterations.</p>
<p>[bwa_index] 98.58 seconds elapse.</p>
<p>[bwa_index] Update BWT... 0.96 sec</p>
<p>[bwa_index] Pack forward-only FASTA... 0.91 sec</p>
<p>[bwa_index] Construct SA from BWT and Occ... 33.18 sec</p>
<p>[main] Version: 0.7.8-r455</p>
<p>[main] CMD: /lrlhps/apps/bioinfo/bwa/bwa-0.7.8/bwa index chrX.fa</p>
<p>[main] Real time: 141.623 sec; CPU: 135.605 sec</p>
<p>由于X染色体也就152M，所以很快，两分钟解决战斗！</p>
<p><strong>然后模拟</strong><strong>Y</strong><strong>染色体的测序判断（</strong><strong>PE100</strong><strong>，</strong><strong>insert400</strong><strong>）</strong></p>
<p>209M Oct 28 10:19 read1.fa</p>
<p>209M Oct 28 10:19 read2.fa</p>
<p>模拟的程序很简单</p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp.png"><img class="alignnone size-full wp-image-1083" src="http://www.bio-info-trainee.com/wp-content/uploads/2015/10/tmp.png" alt="tmp" width="559" height="417" /></a></p>
<p>&nbsp;</p>
<p>while(&lt;&gt;){<br />
chomp;<br />
$chrY.=uc $_;<br />
}<br />
$j=0;<br />
open FH_L,"&gt;read1.fa";<br />
open FH_R,"&gt;read2.fa";<br />
foreach (1..4){<br />
for ($i=600;$i&lt;(length($chrY)-600);$i = $i+50+int(rand(10))){<br />
$up = substr($chrY,$i,100);<br />
$down=substr($chrY,$i+400,100);<br />
next unless $up=~/[ATCG]/;<br />
next unless $down=~/[ATCG]/;<br />
$down=reverse $down;<br />
$down=~tr/ATCG/TAGC/;<br />
$j++;<br />
print FH_L "&gt;read_$j/1\n";<br />
print FH_L "$up\n";<br />
print FH_R "&gt;read_$j/2\n";<br />
print FH_R "$down\n";<br />
}<br />
}<br />
close FH_L;<br />
close FH_R;</p>
<p><strong>然后用</strong><strong>bwa mem </strong><strong>来比对</strong></p>
<p><strong>bwa mem -t 12 -M chrX.fa read*.fa &gt;read.sam</strong></p>
<p>用了12个线层，所以也非常快</p>
<p>[main] Version: 0.7.8-r455</p>
<p>[main] CMD: /apps/bioinfo/bwa/bwa-0.7.8/bwa mem -t 12 -M chrX.fa read1.fa read2.fa</p>
<p>[main] Real time: 136.641 sec; CPU: 1525.360 sec</p>
<p>643M Oct 28 10:24 read.sam</p>
<p><strong>然后统计比对结果</strong></p>
<p><strong>samtools view -bS read.sam &gt;read.bam</strong></p>
<p>158M Oct 28 10:26 read.bam</p>
<p><strong>samtools flagstat read.bam </strong></p>
<p>3801483 + 0 in total (QC-passed reads + QC-failed reads)</p>
<p>0 + 0 duplicates</p>
<p><strong>2153410 + 0 mapped (56.65%:-nan%)</strong></p>
<p>3801483 + 0 paired in sequencing</p>
<p>1900666 + 0 read1</p>
<p>1900817 + 0 read2</p>
<p>645876 + 0 properly paired (16.99%:-nan%)</p>
<p>1780930 + 0 with itself and mate mapped</p>
<p>372480 + 0 singletons (9.80%:-nan%)</p>
<p>0 + 0 with mate mapped to a different chr</p>
<p>0 + 0 with mate mapped to a different chr (mapQ&gt;=5)</p>
<p>我自己看sam文件也发现真的同源性好高呀，总共就模拟了380万reads，就有120万是百分百比对上了。</p>
<p>所以对女性个体来说，测序判断比对到Y染色体是再正常不过的了。如果要判断性别，必须要找那些X,Y差异性区段</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1081.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
