发表于Cancer Cell. 2018 Sep 10;文章是：Overcoming Resistance to Dual Innate Immune and MEK Inhibition Downstream of KRAS. 主要探究的是耐药性，用的是细胞系，但是这篇文章我最关心的是研究团队对转录组和表观数据的结合，还有对CCLE和TCGA这样的公共数据库的挖掘利用。
- The accession number for the RNA-seq data reported in this paper is GEO: GSE96779.
- The accession number for the ChIP-seq data reported in this paper is GEO: GSE96780.
不幸的是：Accession “GSE96779” is currently private and is scheduled to be released on Mar 16, 2020. 虽然现在还不能下载，但是只要这个数据集正式公开，就是一个绝佳的例子。
很中规中矩的流程，没什么好说的，重点就是找到peaks即可： Sequenced reads from histone H3K27 acetylation ChIPed DNA were aligned to hg19 using Bowtie2 with -k mode 1 and enriched ChIP peaks were called by MACS algorithm at the threshold of p < 1x10-5.
找到peaks后续的下游分析也比较简单：Peak signals from two sets of the ChIP peaks (MSR-A549 cells vs. A549 cells) were normalized using MAnorm algorithm and 4,251 peaks were identified to be significantly increased on MSR-A549 cells at the arbitrary threshold at p < 1x10-25.
同样是很中规中矩的流程，而且是非常老旧的流程，可能是因为惯性的力量，不过通常我们认为老旧的流程并没有什么定量准确性的问题，只不过是速度慢了一点而已。当然我还是推荐大家转为 hisat2+featureCounts 流程： Sequenced reads from RNA-seq were aligned with Tophat aligner to UCSC Refgene hg19. Transcripts were quantified using Cufflinks and differentially expressed transcripts were identified by Cuffdiff.
Those 4,251 peaks with increased signal upon resistance were assigned to genes potentially contributed for its expression using Citrome BETA tool. Of those genes, 986 genes were identified to have their expression also significantly increased in MSR-A549 cells to momelotinib/selumetinib treatment at FDR q < 0.25.
From the resulting genes, 659 genes were identified by selecting genes with the average of RPKM in MSR-A549 cells > 0.1 and fold change of average RPKM (MSR-A549 cells/A549 cells) > 1.5, and shown in Table S1 as resistance-related genes.
Lung adenocarcinoma cell lines in CCLE repository are subdivided into 2 classes
- KP (n = 9, H2009, H358, H1792, CALU6, H441, RERFLCAD2, HCC2108, HCC1171 and H2291) harboring KRAS and TP53 mutation/deletion with intact LKB1
- KL (n = 10, H1734, HCC44, H647, H2122, H1573, H1355, A549, H2030, H23 and H1944) harboring KRAS and LKB1 mutation
The RPKM values for each cell line was obtained from CCLE repository, and then 386 differentially expressed genes between KP and KL cell lines (p < 0.05 and FDR q < 0.25) were identified using R platform and TCC package (Sun et al., 2013). Of those genes, 168 genes were upregulated specifically in KP cell lines (M > 0, A > 0), and shown in Figure 7B (Top 30 genes) and Table S2 (168 genes).
The level3 RNA-seq V2 datasets for lung adenocarcinoma samples were downloaded from TCGA data portal and classified into KP samples or KL samples according to their mutation status (See Table S3). Samples having mutations on both LKB1 and TP53 were excluded. TCGA ID: TCGA-78-7160, TCGA-78-7166 and TCGA-78-7540 were further eliminated because they were previously identified as a different subtype from KP or KL (Skoulidis et al., 2015). Then differentially expressed genes between KP samples (n = 21) and KL samples (n = 17) were analyzed by Gene Set Enrichment Analysis (GSEA) with YAP1 signature (Dupont et al., 2011) and MSigDB C3 signature TEAD1 target.