<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>生信菜鸟团 &#187; CCLE</title>
	<atom:link href="http://www.bio-info-trainee.com/tag/ccle/feed" rel="self" type="application/rss+xml" />
	<link>http://www.bio-info-trainee.com</link>
	<description>欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee</description>
	<lastBuildDate>Sat, 28 Jun 2025 14:30:13 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.33</generator>
	<item>
		<title>CCLE数据库里面的1000多个细胞系的RNA-SEQ数据和拷贝数变异数据联合分析</title>
		<link>http://www.bio-info-trainee.com/3040.html</link>
		<comments>http://www.bio-info-trainee.com/3040.html#comments</comments>
		<pubDate>Wed, 14 Feb 2018 14:43:20 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[生信组学技术]]></category>
		<category><![CDATA[CCLE]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=3040</guid>
		<description><![CDATA[我看到这篇science的补充材料最后一个图是： 所以希望可以重复一遍这个分析。 &#8230; <a href="http://www.bio-info-trainee.com/3040.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>我看到这篇science的补充材料最后一个图是：<span id="more-3040"></span></p>
<p><a href="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png"><img class="alignnone size-full wp-image-3041" src="http://www.bio-info-trainee.com/wp-content/uploads/2018/02/highly-correlated-CNV-by-SNP6array-and-RNA-seq.png" alt="highly-correlated-cnv-by-snp6array-and-rna-seq" width="1970" height="1428" /></a></p>
<p>所以希望可以重复一遍这个分析。</p>
<p>重现完毕了，我再来更新哈</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/3040.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>对CCLE数据库可以做的分析</title>
		<link>http://www.bio-info-trainee.com/1327.html</link>
		<comments>http://www.bio-info-trainee.com/1327.html#comments</comments>
		<pubDate>Mon, 11 Jan 2016 11:26:21 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[CCLE]]></category>
		<category><![CDATA[癌症]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1327</guid>
		<description><![CDATA[收集了那么多的癌症细胞系的表达数据，拷贝数变异数据，突变数据，总不能放着让它发霉 &#8230; <a href="http://www.bio-info-trainee.com/1327.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div><span style="font-family: Times New Roman;">收集了那么多的癌症细胞系的表达数据，拷贝数变异数据，突变数据，总不能放着让它发霉吧!</span></div>
<div><span style="font-family: Times New Roman;">这些数据可以利用的地方非常多，但是在谷歌里面搜索引用了它的文章却不多，我挑了其中几个，解读了一下别人是如何利用这个数据的，当然，主要是用那个mRNA的表达数据咯！</span></div>
<div><span style="font-family: Times New Roman;">第一篇：<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146">http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146</a></span></div>
<div>这篇文献对CCLE的数据进行了八个步骤的处理，一个合格的生物信息学分析着完全可以重写这个过程</div>
<div><span style="font-family: Times New Roman;"><b><span style="color: #ff0000;">step1:</span></b>Affymetrix U133 Plus2 DNA microarray gene expressions of 27 gastric cancer cell lines (Kato-III, IM95, SNU-620, SNU-16, OCUM-1, NUGC-4, 2313287, HUG1N, MKN45, NCIN87, KE39, AGS, SNU-5, SNU-216, NUGC-3, NUGC-2, MKN74, MKN7, RERFGC1B, GCIY, KE97, Fu97, SH10TC, MKN1, SNU-1, Hs746 T, HGC27) were downloaded from Cancer Cell Line Encyclopedia (CCLE) <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146#pone.0111146-Barretina1">[16]</a> in March 2013.</span></div>
<div><span style="font-family: Times New Roman;"><b><span style="color: #ff0000;">step2:</span></b> Robust Multi-array Average (RMA) normalization was performed. Principal component analysis plot show no obvious batch effect.</span></div>
<div><span style="font-family: Times New Roman;"><b><span style="color: #ff0000;">step3:</span></b> The normalized data is then collapsed by taking the probe sets with highest gene expression.</span></div>
<div><span style="font-family: Times New Roman;">前三步是为了得到27个胃癌相关细胞系的mRNA表达矩阵，方法是下载cel文件，用RMA归一化，对多探针基因去最大表达量探针！</span></div>
<div>
<p><b><span style="color: #ff0000;">step4:</span></b>Unsupervised hierarchical clustering (1-Spearman distance, average linkage) was performed on the cell lines using the aCGH data.</p>
<p><b>Putative driver genes of which copy number aberrations correlated to mRNA gene expression were identified to determine subtypes or clusters that are driven by different mechanisms.</b><span class="Apple-converted-space"> </span>This was done using Mann Whitney U-test with p&lt;0.05, and Spearman Correlation Coefficient test with Rho &gt;0.6.</p>
<p><b><span style="color: #ff0000;">step5:</span></b>We then performed consensus clustering<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146#pone.0111146-Wilkerson1">[17]</a> on the gene expression data of the 27 gastric cancer cell lines from CCLE using these putative driver genes.<span class="Apple-converted-space"> </span><b>We selected k = 2 as it gives sufficiently stable similarity matrix.</b></p>
<p><b><span style="color: #ff0000;">step6:</span></b> In order to assign new samples to this integrative cluster, significance analysis of microarray<span class="Apple-converted-space"> </span><b>(SAM) </b><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146#pone.0111146-Tusher1">[18]</a>with threshold q&lt;2.0 was used to generate<span class="Apple-converted-space"> </span><b>subtype signature</b><span class="Apple-converted-space"> </span>based on the mRNA expression data of the 1762 genes from the 27 gastric cancer cell lines in CCLE.</p>
<p><span style="font-family: Times New Roman;">先用甲基化数据来聚类，得到putative driver genes，然后再用这些基因的表达数据来再次聚类，分成两类，然后对这两类进行SAM找差异基因<br />
</span></div>
<div><span style="font-family: Times New Roman;"><b><span style="color: #ff0000;">step7:</span></b>ssGSEA (single sample GSEA)was used to estimate pathway activities of the gastric cancer cell line in the Molecular Signature Database v3.1<b><span class="Apple-converted-space"> </span>(Msigdb v3.1)</b> <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146#pone.0111146-Subramanian1">[19]</a>, <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111146#pone.0111146-Verhaak1">[20]</a>. The pathway activities are represented in enrichment scores which were rank normalized to [0.0, 1.0]. </span></div>
<div><span style="font-family: Times New Roman;"><b><span style="color: #ff0000;">step8:</span></b>SAM analysis was performed with threshold q&lt;0.2, and fold change &gt;2.0 (for up-regulated pathways), or &lt;0.5 (for down-regulated pathways) to obtain<span class="Apple-converted-space"> </span><b>subtype-specific pathways</b><span class="Apple-converted-space"> </span>from the 27 gastric cell lines in CCLE.<br />
</span></div>
<div><span style="font-family: Times New Roman;">这里既用来gene set的富集分析，又用来超几何分布的富集分析，结果去看看这篇文章就知道了！</span></div>
<div><span style="font-family: Times New Roman;"> </span></div>
<div><span style="font-family: Times New Roman;">第二篇文献：<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0081803#pone.0081803.s001">http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0081803#pone.0081803.s001</a></span></div>
<div>这篇文章只用了CCLE的一个地方，就是看看不同cancer type里面的某个基因表达boxplot</div>
<div>这个图的数据用GEOquery可以得到，样本的分类信息也用GEOquery可以得到，这样就可以做下面这个图了，非常简单</div>
<div><span style="font-family: Times New Roman;">Further, the Cancer Cell Line Encyclopedia (CCLE) database demonstrated that of 1062 cell lines representing 37 distinct cancer types, glioma cell lines express the highest levels of STK17A<br />
</span></div>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/11.png"><img class="alignnone size-full wp-image-1328" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/11.png" alt="1" width="798" height="426" /></a></div>
<div>
<p><b>结论就是：STK17A is highly expressed in glioma cell lines compared to other cancer types. Data was obtained through the Cancer Cell Line Encyclopedia (CCLE).</b></p>
<p><b>第三篇文献：<a href="http://www.nature.com/ncomms/2013/130709/ncomms3126/fig_tab/ncomms3126_F4.html">http://www.nature.com/ncomms/2013/130709/ncomms3126/fig_tab/ncomms3126_F4.html</a></b></p>
</div>
<div><span style="font-family: Times New Roman;">这篇文献更简单了，直接对这个表达矩阵进行聚类：</span></div>
<div>
<h2><a href="http://www.nature.com/ncomms/2013/130709/ncomms3126/full/ncomms3126.html" target="_blank">Evaluating cell lines as tumour models by comparison of genomic profiles</a></h2>
<div><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/2.jpeg"><img class="alignnone  wp-image-1329" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/2.jpeg" alt="2" width="702" height="409" /></a></div>
</div>
<div><span style="font-family: Times New Roman;"> </span></div>
<div><span style="font-family: Times New Roman;"><b>The 5,000 most variable genes</b><span class="Apple-converted-space"> </span>were used for unsupervised clustering of cell lines by mRNA expression data. Cell lines are colour-coded (vertical bars) according to the reported tissue of origin (a PDF version that can be enlarged at high resolution is in <a href="http://www.nature.com/ncomms/2013/130709/ncomms3126/full/ncomms3126.html#supplementary-information">Supplementary Information</a>, <a href="http://www.nature.com/ncomms/2013/130709/ncomms3126/full/ncomms3126.html#supplementary-information">Supplementary Fig. S4</a>); horizontal labels at bottom indicate the dominating tissue types within the respective branches of the dendrogram. Most ovarian cancer cell lines (magenta) cluster together, interspersed with endometrial cell lines. However, some ovarian cancer cell lines cluster with other tissue types (*). Top right panels: neighbourhoods (1) of the top cell lines in our analysis, (2) of cell line IGROV1, and (3) of cell line A2780. For the ovarian cancer cell lines in these enlarged areas, the histological subtype as assigned in the original publication is indicated by coloured letters.<br />
</span></div>
<div><span style="font-family: Times New Roman;">就直接拿整个表达矩阵即可，然后挑选变异最大的5000个基因来进行聚类，就可以得到类似的图</span></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1327.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CCLE数据库几个知识点</title>
		<link>http://www.bio-info-trainee.com/1324.html</link>
		<comments>http://www.bio-info-trainee.com/1324.html#comments</comments>
		<pubDate>Mon, 11 Jan 2016 11:03:47 +0000</pubDate>
		<dc:creator><![CDATA[ulwvfje]]></dc:creator>
				<category><![CDATA[cancer]]></category>
		<category><![CDATA[CCLE]]></category>
		<category><![CDATA[数据库]]></category>
		<category><![CDATA[细胞系]]></category>

		<guid isPermaLink="false">http://www.bio-info-trainee.com/?p=1324</guid>
		<description><![CDATA[发表ccle的文献：http://www.ncbi.nlm.nih.gov/pm &#8230; <a href="http://www.bio-info-trainee.com/1324.html">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div><span style="font-family: Times New Roman;">发表ccle的文献：<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/</a></span></div>
<div><span style="font-family: Times New Roman;">Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines. </span></div>
<div><span style="font-family: Times New Roman;">收集了三种数据：</span></div>
<div><span style="font-family: Times New Roman;">The mutational status of &gt;1,600 genes was determined by targeted massively parallel sequencing, followed by removal of variants likely to be germline events . </span></div>
<div><span style="font-family: Times New Roman;">Moreover, 392 recurrent mutations affecting 33 known cancer genes were assessed by mass spectrometric genotyping13 . </span></div>
<div><span style="font-family: Times New Roman;">DNA copy number was measured using high-density single nucleotide polymorphism arrays (Affymetrix SNP 6.0; Supplementary Methods). </span></div>
<div><span style="font-family: Times New Roman;">Finally, mRNA expression levels were obtained for each of the lines using Affymetrix U133 plus 2.0 arrays. </span></div>
<div><span style="font-family: Times New Roman;">These data were also used to confirm cell line identities .</span></div>
<div><span style="font-family: Times New Roman;">一般用得最多的就是表达数据，因为表达数据最简单，大多数生物信息学分析着只会用这个数据！</span></div>
<div><span style="font-family: Times New Roman;">而它的突变数据又不是通常意义的高通量测序得到的，snp6芯片数据很多人听都没听过</span></div>
<div><span style="font-family: Times New Roman;">文章的<b>附件</b>有对cell lines的具体描述。</span></div>
<div><span style="font-family: Times New Roman;"><a href="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/different_kinds_of_cancer_in_CCLE.png"><img class="alignnone size-full wp-image-1325" src="http://www.bio-info-trainee.com/wp-content/uploads/2016/01/different_kinds_of_cancer_in_CCLE.png" alt="different_kinds_of_cancer_in_CCLE" width="489" height="411" /></a></span></div>
<div><span style="font-family: Times New Roman;">CCLE的数据在broad institute里面可以下载，也放在GEO数据库里面，我比较喜欢GEO里面的数据</span></div>
<div><a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36139"><span style="font-family: Times New Roman;">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36139</span></a></div>
<div><span style="font-family: Times New Roman;">This SuperSeries is composed of the following SubSeries:</span></div>
<div><span style="font-family: Times New Roman;">GSE36133 Expression data from the Cancer Cell Line Encyclopedia (CCLE)</span></div>
<div><span style="font-family: Times New Roman;">GSE36138 SNP array data from the Cancer Cell Line Encyclopedia (CCLE)</span></div>
<div><span style="font-family: Times New Roman;">GSE36133这个study的<b>metadata</b>里面有对每个cellline来源的cancer进行描述！</span></div>
<div><span style="font-family: Times New Roman;">有人喜欢把这个<b>metadata叫做是clinical data。</b></span></div>
<div><span style="font-family: Times New Roman;">library(GEOquery)</span></div>
<div><span style="font-family: Times New Roman;">ccleFromGEO &lt;- getGEO("GSE36133")</span></div>
<div><span style="font-family: Times New Roman;">annotBlock1 &lt;-<span class="Apple-converted-space"> </span><b>pData</b>(<b>phenoData</b>(ccleFromGEO[[1]]))</span></div>
<div><span style="font-family: Times New Roman;">&gt;dim(annotBlock1)</span></div>
<div><span style="font-family: Times New Roman;">[1] 917  38</span></div>
<div><span style="font-family: Times New Roman;">exprSet=exprs(ccleFromGEO[[1]])</span></div>
<div><span style="font-family: Times New Roman;">&gt; dim(<b>exprSet</b>)</span></div>
<div><span style="font-family: Times New Roman;">[1] 18926   917</span></div>
<div><span style="font-family: Times New Roman;">##它的表达数据矩阵，包含了18926个基因，列名是917个细胞系的名字，行是基因的entrez ID</span></div>
<div><span style="font-family: Times New Roman;">keyColumns &lt;- c("title", "source_name_ch1", "characteristics_ch1", "characteristics_ch1.1", </span></div>
<div><span style="font-family: Times New Roman;">    "characteristics_ch1.2")</span></div>
<div><span style="font-family: Times New Roman;">options(stringsAsFactors = F)</span></div>
<div><span style="font-family: Times New Roman;">allAnnot=annotBlock1[,keyColumns]</span></div>
<div><span style="font-family: Times New Roman;">##这几列信息是比较重要的metadata，里面详细记录了细胞系的收集公司单位，tissue，癌症分类等信息</span></div>
<div>
<div><span style="font-family: Times New Roman;">Cell line （1035个细胞系简介）Gene Sets</span></div>
<div><span style="font-family: Times New Roman;">1035 sets of genes with high or low expression in each cell line relative to other cell lines from the CCLE Cell Line Gene Expression Profiles dataset.</span></div>
<div><a href="http://amp.pharm.mssm.edu/Harmonizome/dataset/CCLE+Cell+Line+Gene+Expression+Profiles"><span style="font-family: Times New Roman;">http://amp.pharm.mssm.edu/Harmonizome/dataset/CCLE+Cell+Line+Gene+Expression+Profiles</span></a></div>
<div><span style="font-family: Times New Roman;">一些关于CCLE数据库的文章：</span></div>
<div><a href="http://cancerres.aacrjournals.org/content/73/8_Supplement/2409.short"><span style="font-family: Times New Roman;">http://cancerres.aacrjournals.org/content/73/8_Supplement/2409.short</span></a></div>
<div><a href="http://cancerres.aacrjournals.org/content/74/22/6390.short"><span style="font-family: Times New Roman;">http://cancerres.aacrjournals.org/content/74/22/6390.short</span></a></div>
<div><a href="https://clincancerres.aacrjournals.org/content/19/19_Supplement/IA2.abstract"><span style="font-family: Times New Roman;">https://clincancerres.aacrjournals.org/content/19/19_Supplement/IA2.abstract</span></a></div>
<div><span style="font-family: Times New Roman;"><a href="http://onlinelibrary.wiley.com/doi/10.1002/cncy.21471/pdf">http://onlinelibrary.wiley.com/doi/10.1002/cncy.21471/pdf</a><span class="Apple-converted-space"> </span>介绍了几个类似的数据库资源</span></div>
<div><span style="font-family: Times New Roman;"><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0088557">http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0088557</a><span class="Apple-converted-space"> </span> 讲解了high/low的知识</span></div>
<div><span style="font-family: Times New Roman;"><a href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7060697">http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7060697</a><span class="Apple-converted-space"> </span> 药物相关</span></div>
<div><span style="font-family: Times New Roman;">Anticancer drug sensitivity analysis: An integrated approach applied to Erlotinib sensitivity prediction in the CCLE database</span></div>
<div><span style="font-family: Times New Roman;"><a href="http://biorxiv.org/content/biorxiv/early/2015/10/02/028159.full.pdf">http://biorxiv.org/content/biorxiv/early/2015/10/02/028159.full.pdf</a><span class="Apple-converted-space"> </span>比较了CCLE和TCGA的数据</span></div>
<div></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.bio-info-trainee.com/1324.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
