<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; PRC2</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/prc2/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>一个ChIP-seq实战-超级简单-2小时搞定！</title>
		<link>http://www.bio-info-trainee.com/2257.html</link>
		<comments>http://www.bio-info-trainee.com/2257.html#comments</comments>
		<pubDate>Tue, 10 Jan 2017 03:14:22 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[CHIP-seq]]></category>
		<category><![CDATA[Cbx7]]></category>
		<category><![CDATA[peaks]]></category>
		<category><![CDATA[PRC1]]></category>
		<category><![CDATA[PRC2]]></category>
		<category><![CDATA[Ring1B]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=2257</guid>
		<description><![CDATA[请不要直接拷贝我的代码，需要自己理解，然后打出来，思考我为什么这样写代码。 软件 &#8230; <a href="http://www.bio-info-trainee.com/2257.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div>
<div><span style="color: #ff0000;"><strong>请不要直接拷贝我的代码，需要自己理解，然后打出来，思考我为什么这样写代码。</strong></span></div>
<div><span style="color: #ff0000;"><strong>软件请用最新版，尤其是samtools等被我存储在系统环境变量的，考虑到读者众多，一般的软件我都会自带版本信息的！</strong></span></div>
<div>我用两个小时，不代表你是两个小时就学会，有些朋友反映学了两个星期才 学会，这很正常，没毛病，不要异想天开两个小时就达到我的水平。</div>
<div></div>
</div>
<div>本次讲解选取的文章是为了探索PRC1，PCR2这样的蛋白复合物，不是转录因子或者组蛋白的CHIP-seq，请注意区别！</div>
<div>这是一个系列帖子，你可以先看：</div>
<div>
<div><a href="http://www.bio-info-trainee.com/1024.html">一个表达芯片数据处理实例</a></div>
<div><a href="http://www.bio-info-trainee.com/2218.html">一个RNA-seq实战-超级简单-2小时搞定！</a></div>
<div><a href="http://www.bio-info-trainee.com/1159.html">WES（七）看de novo变异情况</a></div>
<div><a href="http://www.bio-info-trainee.com/2169.html">【直播】我的基因组22：用IGV查看具体某个位点是否变异</a></div>
</div>
<div>文章是：RYBP and Cbx7 define specific biological functions of polycomb complexes in mouse embryonic stem cells</div>
<div><a href="https://www.ncbi.nlm.nih.gov/pubmed/23273917">https://www.ncbi.nlm.nih.gov/pubmed/23273917</a></div>
<div>RYBP and Cbx7都是Polycomb repressive complex 1 (PRC1)的组分：</div>
<div>数据都在：<a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42466">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42466</a></div>
<div>所以用脚本在ftp里面批量下载即可：</div>
<div><a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311</a></div>
<div><img class="alignnone size-full wp-image-2264" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/01/11.png" alt="1" width="477" height="169" /></div>
<div></div>
<p><span id="more-2257"></span></p>
<div>下载地址很容易获取啦！</div>
<div>for ((i=204;i&lt;=209;i++)) ;do wget <a href="ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311">ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311</a>/SRR620$i/SRR620$i.sra;done</div>
<div>ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump --split-3 $id;done</div>
<div><span style="color: #ff0000;">图片丢失~~~~~</span></div>
<div>因为我用fastqc看了看数据质量，代码如下：</div>
<div>ls *fastq |xargs ~/biosoft/fastqc/FastQC/fastqc -t 10</div>
<div>发现3端质量有点问题，我就用了-3 5 --local参数，</div>
<div>首先用bowtie2软件把测序得到的fastq文件比对到mm10参考基因组上面</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 6 -3 5 --local -x ~/reference/index/bowtie/mm10 -U SRR620204.fastq| samtools sort -O bam -o ring1B.bam</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 6 -3 5 --local -x ~/reference/index/bowtie/mm10 -U SRR620205.fastq| samtools sort -O bam -o cbx7.bam</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 6 -3 5 --local -x ~/reference/index/bowtie/mm10 -U SRR620206.fastq| samtools sort -O bam -o suz12.bam</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 6 -3 5 --local -x ~/reference/index/bowtie/mm10 -U SRR620207.fastq| samtools sort -O bam -o RYBP.bam</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 6 -3 5 --local -x ~/reference/index/bowtie/mm10 -U SRR620208.fastq| samtools sort -O bam -o IgGold.bam</div>
<div>~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -p 6 -3 5 --local -x ~/reference/index/bowtie/mm10 -U SRR620209.fastq| samtools sort -O bam -o IgG.bam</div>
<div><img class="alignnone size-full wp-image-2259" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/01/3.png" alt="3" width="220" height="111" /></div>
<div><strong><span style="color: #ff0000;">接下来需要对bam文件进行简单过滤，包括未比对的和multiple比对的，但是我比较懒，就直接用MACS2软件来call peaks啦！</span></strong></div>
<div>nohup ~/.local/bin/macs2 callpeak -c ../IgGold.bam -t ../suz12.bam -m 10 30 -p 1e-5 -f BAM -g mm -n suz12 2&gt;suz12.masc2.log &amp;</div>
<div>nohup ~/.local/bin/macs2 callpeak -c ../IgGold.bam -t ../ring1B.bam -m 10 30 -p 1e-5 -f BAM -g mm -n ring1B 2&gt;ring1B.masc2.log &amp;</div>
<div>nohup ~/.local/bin/macs2 callpeak -c ../IgG.bam -t ../cbx7.bam -m 10 30 -p 1e-5 -f BAM -g mm -n cbx7 2&gt;cbx7.masc2.log &amp;</div>
<div>nohup ~/.local/bin/macs2 callpeak -c ../IgG.bam -t ../RYBP.bam -m 10 30 -p 1e-5 -f BAM -g mm -n RYBP 2&gt;RYBP.masc2.log &amp;</div>
<div><img class="alignnone size-full wp-image-2260" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/01/4.png" alt="4" width="222" height="79" /></div>
<div>大家可以看到RYBP这个CHIP-seq我几乎得不到peaks，哪怕是换了一个control，除非我不用任何control！我用IGV看了看，这个RYBP的确很诡异，我怀疑是作者上传数据出错了！</div>
<div>而且作者在GEO给的PEAKS个数如下：</div>
<div>2754 GSE42466_Cbx7_peaks_10.txt<br />
6982 GSE42466_Ring1b_peaks_10.txt<br />
6872 GSE42466_RYBP_peaks_5.txt<br />
8054 GSE42466_Suz12_peaks_10.txt</div>
<div></div>
<div></div>
<div>首先对这些bam文件批量转换成bw文件。然后批量画图</div>
<div>ls ../*bam |while read id</div>
<div>do</div>
<div>file=$(basename $id )</div>
<div>sample=${file%%.*}</div>
<div>echo $sample</div>
<div>bamCoverage -b $id -o $sample.bw ## 这里有个参数，-p 10 --normalizeUsingRPKM</div>
<div>computeMatrix reference-point --referencePoint TSS -b 10000 -a 10000 -R ~/annotation/CHIPseq/mm10/ucsc.refseq.bed -S $sample.bw --skipZeros -o matrix1_${sample}_TSS.gz --outFileSortedRegions regions1_${sample}_genes.bed</div>
<div>plotHeatmap -m matrix1_${sample}_TSS.gz -out ${sample}.png</div>
<div>done</div>
<div><img class="alignnone size-full wp-image-2261" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/01/5.png" alt="5" width="199" height="98" /></div>
<div></div>
<div>然后整合所有的chipseq的bam文件，画基因的TSS附近的profile和heatmap图</div>
<div>computeMatrix reference-point -p 10 --referencePoint TSS -b 2000 -a 2000 -S ../*bw -R ~/annotation/CHIPseq/mm10/ucsc.refseq.bed --skipZeros -o tmp4.mat.gz</div>
<div>plotHeatmap -m tmp4.mat.gz -out tmp4.merge.png</div>
<div>plotProfile --dpi 720 -m tmp4.mat.gz -out tmp4.profile.pdf --plotFileFormat pdf --perGroup</div>
<div>plotHeatmap --dpi 720 -m tmp4.mat.gz -out tmp4.merge.pdf --plotFileFormat pdf</div>
<div>最后整合所有的chipseq的bam文件，画基因的genebody附近的profile和heatmap图</div>
<div>computeMatrix scale-regions -p 10 -S ../*bw -R ~/annotation/CHIPseq/mm10/ucsc.refseq.bed -b 3000 -a 3000 -m 5000 --skipZeros -o tmp5.mat.gz</div>
<div>plotHeatmap -m tmp5.mat.gz -out tmp5.merge.png</div>
<div>plotProfile --dpi 720 -m tmp5.mat.gz -out tmp5.profile.pdf --plotFileFormat pdf --perGroup</div>
<div>plotHeatmap --dpi 720 -m tmp5.mat.gz -out tmp5.merge.pdf --plotFileFormat pdf</div>
<div>下面是输出的图的例子，我只放了tss附近的！</div>
<div><img class="alignnone size-full wp-image-2262" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/01/6.png" alt="6" width="1378" height="815" /></div>
<div>上图可以看到RYBP的peaks的中点在TSS处，而其它peaks都在TSS下游一点点。</div>
<div>用Sequential ChIP (re-ChIP)实验的确可以看到RYBP和CBX7的peaks有重合。</div>
<div><img class="alignnone size-full wp-image-2263" src="http://www.bio-info-trainee.com/wp-content/uploads/2017/01/7.png" alt="7" width="787" height="363" /></div>
<div></div>
<div>这篇文章一直翻来覆去说 这些CHIP-seq实验的peaks的交叉情况：</div>
<div></div>
<div>PRC1的组分异常复杂，包括 Cbx (Cbx2, Cbx4, Cbx6, Cbx7, or Cbx8); Ring1A or Ring1B; PHC (PHC1, PHC2, or PHC3); PCGF (PCGF1, PCGF2, PCGF3, PCGF4, PCGF5, or PCGF6); and RYBP or YAF2.<br />
其中，a Ring1A/B E3 ligase subunit that monoubiquitinates histone H2A at lysine 119 (H2AK119ub)<br />
但不是说都必须要有，而是它们的组合，形成了各种各样的PRC1，但是都统一叫做PRC1。<br />
比如在mouse的ESCs里面，就有两种PRC1，它们的 Cbx7 or RYBP 是不可能共存的！我们可以把它们分别叫做， Cbx7-PRC1, RYBP-PRC1Cbx7 的功能是把 Ring1B 招募到染色质上面，是必须的。它结合的基因多参与 early-lineage commitment of ESCs.<br />
RYBP 可以增强PRC1的酶活性，它结合大基因多参与，regulation of metabolism and cell-cycle progression<br />
RYBP 结合的基因要比 CBX7 结合的基因表达量高。 因为CBX7结合的同时，会招募PRC2这个抑制marker。<br />
而PRC2 deposits the histone H3 lysine 27 trimethyl repressive mark (H3K27me3) through the Ezh1/2 histone methyltransferase enzymes.如何描述它们这些peaks的交叉情况呢？<br />
We observed an overlap of RYBP peaks (3,918 in total) with 14%, 42%, and 37% of Cbx7, Ring1B, and Suz12 peaks, respectively<br />
Moreover, although more than 90% of Cbx7 peaks contained Ring1B and Suz12, 20% were also bound by RYBP<br />
尽管RYBP and Cbx7 在大部分情况下都是互相排斥的，但是也在少部分基因组区域存在共定位的现象。Ring1B / Suz12的peaks情况可以被 Cbx7 和 RYBP 的peaks情况说明：<br />
RYBP and Cbx7 都有的地方，有着高Ring1B/Suz12<br />
Cbx7 but not RYBP的地方，Ring1B/Suz12会稍微低一点<br />
RYBP but not Cbx7的地方，Ring1B/Suz12会更低一点<br />
RYBP and Cbx7 都没有的地方，Ring1B/Suz12就最少！RYBP的peaks的中点在TSS处，而其它peaks都在TSS下游一点点。<br />
用Sequential ChIP (re-ChIP)实验的确可以看到RYBP和CBX7的peaks有重合。而且RYBP还有一些peaks是其它PRC1所没有的，说明它可以独立于PRC1发挥作用H2AK119ub 与 Ring1B/Suz12正相关，但是与RYBP只有25.7%交叉，与CBX7有着72%交叉，所以可以把 PRC1 target genes分成3类：<br />
a first set with Cbx7/Ring1B/H2AK119ub; ~~~~GO/KEGG分析，<br />
a second that contains RYBP and lower levels of Ring1B/H2AK119ub<br />
a third set cobound by RYBP/Cbx7/Ring1B and that also contains H2AK119ub.</p>
<p>然后这些所有的gene list都可以拿去做GO/KEGG分析，看看是不是有什么biological meaning ！<br />
genes co-occupied by Ring1B/Cbx7/RYBP and H2AK119ub are involved in system development.<br />
genes containing RYBP/Ring1B/H2AK119ub, but not Cbx7, have a strong association with the M phase of the meiotic cycle and cellular metabolism<br />
genes with Cbx7/Ring1B/H2AK119ub are involved in developmental processes and mesoderm specification,<br />
those containing RYBP/Cbx7/Ring1B/H2AK119ub predominantly represent the ectodermal fate and, to a lesser extent, mesoderm and endoderm fates</p>
<p>超过700的基因有 RYBP/Cbx7/Ring1B的peaks，所以作者敲除Cbx7 看看 RYBP的peaks是否会变化，但是没有做CHIP-seq，只是做了ChIP-qPCR</p>
<p>下面这个结论很重要：<br />
Overall, our ChIP-seq analysis allowed us to identify five types of genes according to the occupancy of PRC1 and PRC2: those with<br />
(1) Ring1B/Cbx7/RYBP and Suz12 (725 genes);<br />
(2) Ring1B/Cbx7/Suz12, but not RYBP (1,527 genes);<br />
(3) Ring1B/RYBP/Suz12, but not Cbx7 (861 genes);<br />
(4) only Ring1B and Suz12 (1,694 genes); or<br />
(5) RYBP but no Polycomb proteins (1,674)</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/2257.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
