dbGaP数据库的测序数据当然是可以申请成功的

通常情况下,我们的原始测序数据会上传到NCBI的SRA数据库,自然就在EBI备份了。需要熟悉GEO和SRA数据库:

一般来说,NCBI数据库提供的prefetch命令下载sra文件速度太慢,可以参考:使用ebi数据库直接下载fastq测序数据 , 需要自行配置好,然后去EBI里面搜索到的 fq.txt 路径文件:

脚本如下:

# conda activate download
# 自己搭建好 download 这个 conda 的小环境哦。
cat fq.txt |while read id
do
ascp -QT -l 300m -P33001 \
-i ~/miniconda3/envs/download/etc/asperaweb_id_dsa.openssh \
era-fasp@$id .
done
# nohup bash step1-aspera.sh 1>step1-aspera.log 2>&1 &

这个脚本会根据你在EBI里面搜索到的 fq.txt 路径文件,来批量下载fastq测序数据文件。

需要授权才能访问的数据库

但是有些时候,大家并不会选择完全开放自己的数据库,比如上传到

比如文章Cancer Cell. 2021 May 10 .,标题是:《Progressive immune dysfunction with advancing disease stage in renal cell carcinoma》,我看了看他们的数据在 :https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002252.v1.p1

是3个测序技术:

  • Single-cell RNA sequencing (scRNA-seq)
  • TCR sequencing (scTCR-seq)
  • Whole exome sequencing

13个病人:

  • 4 with early stage disease (stage I/II),
  • 4 with locally advanced disease (stage III),
  • 5 with advanced/metastatic disease (stage IV)

仔细看了看页面信息,虽然是在dbgap数据库,不便公开,需要申请。其实已经有两个数据申请的要求被满足了

第一个是:

Requestor: Turajlic, Samra
Affiliation: FRANCIS CRICK INSTITUTE, LTD
Project: Meta-analysis of single-cell sequencing data in clear cell renal cell carcinoma
Date of approval: 2021-05-27
Request status: approved
Research use statements (Hide)

第二个是:

Requestor: Van Allen, Eliezer
Affiliation: DANA-FARBER CANCER INST
Project: Whole exome and transcriptome predictors of response to immune checkpoint therapy for advanced cancers
Date of approval: 2021-05-11
Request status: approved
Research use statements (Hide)

绝大部分情况下无需申请原始数据啦

因为这样的ccRCC的单细胞文献已有十几篇啦,其它数据集都公布了原始测序数据,并不需要在这一棵树上吊死哦!

另外,其实这个文章自己也有提供表达量矩阵,不过并没有在GEO数据库,而是直接放在了文章附件:

  • supplementary Data S1: Data S1. ScRNA-seq raw count matrix (part 1 of 2), after quality control filtering, with genes as rows and cell barcodes as columns, related to Figure 16, S13, and S5.

NIHMS1692222-supplement-supplementary_Data_S1.zip (143M)

GUID: 217E8B40-EB49-4FF5-AEF5-57BBBA4DAE61

  • supplementary Data S2: Data S2. ScRNA-seq raw count matrix (part 2 of 2), after quality control filtering, with genes as rows and cell barcodes as columns, related to Figure 16, S13, and S5.

NIHMS1692222-supplement-supplementary_Data_S2.csv (1.7G)

GUID: 34477B69-0F73-4D9A-B926-66981E1D5D4A

Comments are closed.