




数据集是 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118719

7 NPC biopsy specimens and 4 normal nasopharyngeal mucosal specimens were sampled. Total RNA were extracted from these samples, and analyzed by RNA-sequencing.

作者提供了RNA-seq的表达矩阵:https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118719/suppl/GSE118719_mrna.expression.tsv.gz ,不过这个数据集本来就提供原始测序数据下载,也可以很方便的自己走一波数据分析流程拿到自己的表达矩阵。



  • 文章是: Upregulated long non-coding RNA AFAP1-AS1 expression is associated with progression and poor prognosis of nasopharyngeal carcinoma. Oncotarget 2015 Aug 21;6(24):20404-18. PMID: 26246469
  • 使用的是[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
  • 数据集是:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64634
  • 实验设计是:Total RNA extracted from laser-captured epithelium from 12 nasopharyngeal carcinomas and 4 normal healthy nasopharyngeal tissue specimens.
GSM1575894 normal nasopharyngeal tissue, specimen N1
GSM1575895 normal nasopharyngeal tissue, specimen N2
GSM1575896 normal nasopharyngeal tissue, specimen N3
GSM1575897 normal nasopharyngeal tissue, specimen N4
GSM1575898 nasopharyngeal carcinoma, specimen T1
GSM1575899 nasopharyngeal carcinoma, specimen T2
GSM1575900 nasopharyngeal carcinoma, specimen T3
GSM1575901 nasopharyngeal carcinoma, specimen T4
GSM1575902 nasopharyngeal carcinoma, specimen T5
GSM1575903 nasopharyngeal carcinoma, specimen T6
GSM1575904 nasopharyngeal carcinoma, specimen T7
GSM1575905 nasopharyngeal carcinoma, specimen T8
GSM1575906 nasopharyngeal carcinoma, specimen T9
GSM1575907 nasopharyngeal carcinoma, specimen T10
GSM1575908 nasopharyngeal carcinoma, specimen T11
GSM1575909 nasopharyngeal carcinoma, specimen T12



首先是Sep 30, 2019的 Long non-coding RNAs and mRNAs expression profilling in human nasopharyngeal carcinoma ,数据集是:GSE126683 使用的是Agilent-045997 Arraystar human lncRNA microarray V3 (Probe Name Version)芯片平台

GSM3611201 1_GX5: Normal
GSM3611202 2_GX6: Normal
GSM3611203 3_GX8: Normal
GSM3611204 4_662: NPC
GSM3611205 5_667: NPC
GSM3611206 6_751: NPC


We performed genome-wide lncRNAs expression in 3 pairs of NPC and normal nasopharynx tissues and identified 384 dysregulated lncRNAs (fold change ≥2 and P <0.05).

所以作者就从差异分析结果里面挑选了 FAM225A ,依据的标准是;

  • FAM225A was one of the most upregulated lncRNAs in NPC.
  • FAM225A significantly associated with poor survival in NPC.


这个研究使用的是:Arraystar Human LncRNA microarray V2.0 (Agilent_033010 Probe Name version) 数据集是:GSE95166

GSM2498136 T_1
GSM2498137 T_2
GSM2498138 T_3
GSM2498139 T_4
GSM2498140 I_1
GSM2498141 I_2
GSM2498142 I_3
GSM2498143 I_4

跟前面的数据集是:GSE126683 实验设计是一模一样,所以两个结果可以对比分析!


发表在Biomed Res Int. 2015的Long Noncoding RNA Expression Signatures of Metastatic Nasopharyngeal Carcinoma and Their Prognostic Value。因为发表的较早,所以使用的是 Human lncRNA Array v2.0 (8 × 60 K, Arraystar),统计学结果是:8,088 lncRNAs were found to be significantly differentially expressed (≥2-fold) 这篇文章并没有把其表达矩阵上传到GEO数据框,而是以附件Excel表格形式给出了,所以也可以重新分析看看。



实验设计是;We collected 25 primary NK-NPCs and 8 nasopharynx tissues obtained from patients with inflamed nasopharyngeal mucosa. mRNA expression profiling was performed followed by bioinformatics analysis.

这个芯片平台有点奇怪:GPL8380Capitalbio 22K Human oligo array version 1.0

这个数据集还被挖掘过,NPC (GSE40290), 573 genes and 3,711 genes (green) were differentially expressed in high-TRIM26 NPC and low-TRIM26 NPC ,文章是:published: 28 June 2018 https://doi.org/10.1002/cam4.1537



  • 第一讲:GEO,表达芯片与R
  • 第二讲:从GEO下载数据得到表达量矩阵
  • 第三讲:对表达量矩阵用GSEA软件做分析
  • 第四讲:根据分组信息做差异分析
  • 第五讲:对差异基因结果做GO/KEGG超几何分布检验富集分析
  • 第六讲:指定基因分组boxplot指定基因list画热图

