免疫基因集可以如此复杂

以前我只知道 CIBERSORT 里面是内置了LM22基因集，CIBERSORT是2015年在Nature Methods发表的一个方法，工具在： (http://cibersort.stanford.edu).，这个方法，直接衍生出了一系列数据挖掘文章，如果你使用 CIBERSORT + bioinformatics 的关键词去搜索：

发文章数量让人震惊

很难弄清楚到底是说第一次应用了这个数据挖掘套路，不过早在2016发表的文章. Patterns of Immune Infiltration in Breast Cancer and Their Clinical Implications: A Gene-Expression-Based Retrospective Study. PLOS Medicine 13,e1002194.作者研究团队利用CIBERSORT算法推断解析了11,000个乳腺癌(组织转录组芯片或是RNAseq，包括GEO和TCGA)中的22种免疫细胞的占比。

然后 2018-2020是一个爆发期，肺癌，肾癌，肠癌，肝癌基本上都是有十几篇几乎是一模一样的TCGA数据看其转录组数据里面的的22种免疫细胞的占比的文章出来。

比如：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7399742/
标题：《Immune Cell Infiltration and Identifying Genes of Prognostic Value in the Papillary Renal Cell Carcinoma Microenvironment by Bioinformatics Analysis》
两个数据集：
- TCGA website for kidney renal papillary cell carcinoma projects (TCGA-KIRP)
- 表达量芯片： GSE7023 and GSE2748
两个算法：
- MAlignant Tumor tissue using Expression data (ESTIMATE)
- Cell-type Identification By Estimating Relative Subsets Of known RNA Transcripts (CIBERSORT)

这样的数据挖掘文章里面除了癌症不一样，其余的基本上一模一样。当然了，甚至有一些连癌症也是一样的，数据集不一样。极端情况下，数据集也是一样的，让人无语。

并不是所有人都不思考

比如2021年7月发表在《Briefings in Bioinformatics》的文章《Clinical significance and immunogenomic landscape analyses of the immune cell signature based prognostic model for patients with breast cancer》，https://doi.org/10.1093/bib/bbaa311 就不再局限于CIBERSORT 里面是内置了LM22基因集，而是自己参考了大量文献，如下所示：

The 184 immune cell signatures were collected from diverse resources through an extensive literature search on the website. Of them,

25 signatures were obtained from the work of Bindea et al. [26],
68 signatures were obtained from the work of Wolf et al. [27],
17 signatures were downloaded from the ImmPort database [28],
24 T cell signatures were downloaded from the work of Miao et al. [29]
22, 10 and 10 signatures were obtained from CIBERSORT [16], MCP-Counter (R package, version 1.1) [30] and ImSig (R package, version 1.0.0) [31], respectively.

More detailed information is listed in the supplementary material and Supplementary Table S1–Supplementary Table S4.

如果你感兴趣这些基因集，可以自己去阅读文献。

当然了，很大程度上，做这么多工作仅仅是因为数据挖掘低垂的果实已经被采摘完毕，如果不下苦功夫，你的绝大部分代码和图表只能发表在微信公众号里面。

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

免疫基因集可以如此复杂

并不是所有人都不思考

2026年6月
一	二	三	四	五	六	日
« 九
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30