表达量矩阵并不一定要上传到GEO或者ArrayExpress

最近在系统性整理单细胞转录组图谱计划,发现了一个有意思的数据共享方式,就是2018的小鼠单细胞图谱,文章标题是:《A single-cell transcriptomic atlas characterizes ageing tissues in the mouse》,链接是:https://www.nature.com/articles/s41586-020-2496-1
作者:Tabula Muris Consortium · 2018 · 截止到2021-06-11被引用次数:480
该文描述了斯坦福大学、陈-扎克伯格生物中心以及加州大学旧金山分校的研究人员建立的名为Tabula Muris的开源数据库,主要包括对小鼠20个器官和组织的超10万个单细胞的转录组图谱,及对不同组织和细胞类型的基因表达的比较。

以figshare形式分享

FigShare接受研究者上传图表、多媒体、海报、论文(包括预印本)和多文件、数据集等,提供了当前学术出版所不具备的一种文件共享模式。采用Creative Commons 许可协议共享数据,减少版权纠纷,使全球科学家可以存取、共享信息。
这篇文章在文章给出来了两个数据分享链接:

  • 10.6084/m9.figshare.5715040 for FACS/Smartseq2
  • 10.6084/m9.figshare.5715025 for 10X data.
    而且如此出名的数据集,在R语言的bioconductor也有整理好的数据对象:https://bioconductor.org/packages/devel/data/experiment/vignettes/TabulaMurisData/inst/doc/TabulaMurisData.html

    suppressPackageStartupMessages({
    library(ExperimentHub)
    library(SingleCellExperiment)
    library(TabulaMurisData)
    })
    #> snapshotDate(): 2021-05-05
    eh <- ExperimentHub()
    #> snapshotDate(): 2021-05-05
    query(eh, "TabulaMurisData")
    #> ExperimentHub with 2 records
    #> # snapshotDate(): 2021-05-05
    #> # retrieve records with, e.g., 'object[["EH1617"]]'
    #>
    #> title 
    #> EH1617 | TabulaMurisDroplet 
    #> EH1618 | TabulaMurisSmartSeq2
    

    可以看到,同样的也是两个分开了的表达量矩阵,他们走到是 SummarizedExperiment 流派,并不是seurat流派,所以有自己的一套对象规则, 也有 自己的网页工具: (2018). “iSEE: Interactive SummarizedExperiment Explorer.” F1000Research, 7, 741. doi: 10.12688/f1000research.14966.1.

    也有纯粹文章附件形式分享

    比如文章 2021 Mar 11. doi: 10.1016/j.ccell.2021.02.013,标题是:《Progressive immune dysfunction with advancing disease stage in renal cell carcinoma》,数据仅仅是附件:

    supplementary Data S1: Data S1.

    ScRNA-seq raw count matrix (part 1 of 2), after quality control filtering, with genes as rows and cell barcodes as columns, related to Figure 16, S13, and S5.
    NIHMS1692222-supplement-supplementary_Data_S1.zip (143M) 这个是压缩包,解压后是5个多G的csv文件,有3万多行的基因
    GUID: 217E8B40-EB49-4FF5-AEF5-57BBBA4DAE61

    supplementary Data S2: Data S2.

    ScRNA-seq raw count matrix (part 2 of 2), after quality control filtering, with genes as rows and cell barcodes as columns, related to Figure 16, S13, and S5.
    NIHMS1692222-supplement-supplementary_Data_S2.csv (1.7G) ,仅仅是1万多行的基因
    GUID: 34477B69-0F73-4D9A-B926-66981E1D5D4A
    文章对其单细胞实验设计描述的很清楚是:We performed single-cell RNA and T cell receptor sequencing (scRNA-seq/scTCR-seq) on 164,722 individual cells from tumor and adjacent non-tumor tissue in patients with ccRCC across disease stages – early, locally advanced, and advanced/metastatic.
    但是让我失望的是,文章附件展示的csv文件是不全的!!!
    为什么不老老实实的上传到GEO或者ArrayExpress呢?

Comments are closed.