自学miRNA-seq分析第一讲~文献选择与解读

前些天逛bioStar论坛的时候看到了一个问题，是关于miRNA分析，提问者从NCBI的SRA数据下载文献提供的原始数据，然后处理的时候有些不懂，我看到他列出的数据是iron torrent测序仪的，而且我以前还没玩过miRNA-seq的数据分析，就抽空自学了一下。因为我有RNA-seq的基础，所以理解学习起来比较简单。特记录一下自己的学习过程，希望对后学者有帮助。

这里选择的文章是2014年发表的，作者用ET-1刺激human iPSCs (hiPSC-CMs) 细胞前后，想看看 miRNA和mRNA表达量的变化，我并没有细看该文章的生物学意义，仅仅从数据分析的角度解读一下这篇文章，mRNA表达量用的是Affymetrix Human Genome U133 Plus 2.0 Array，分析起来特别容易，就是得到表达矩阵，然后用limma这个包找找差异表达基因即可。但是mRNA分析起来就有点麻烦了，作者用的是iron torrent测序仪，但是从SRA数据中心下载的是已经去掉接头的测序数据，fastq格式的，所以这里其实并不需要考虑测序仪的特异性。

关于该文章的几个资料收集如下：

## paper : http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0108051

## Aggarwal P, Turner A, Matter A, Kattman SJ et al. RNA expression profiling of human iPSC-derived cardiomyocytes in a cardiac hypertrophy model. PLoS One 2014;9(9):e108051. PMID: 25255322

## The accession numbers are 1. SuperSeries (mRNA+miRNA) - GSE60293

## 2. mRNA expression array - GSE60291 (Affymetrix Human Genome U133 Plus 2.0 Array)

## 3. miRNA-Seq - GSE60292 (Ion Torrent)

## GEO : http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292

## FTP : ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP045/SRP045420

仔细看看该文章做了哪些分析，然后才能自己模仿，得到同样的数据分析结果。

该文章处理数据的流程是：
Ion Torrent's Torrent Suite version 3.6 was used for basecalling
Raw sequencing reads were aligned using the SHRiMP2 aligner and were aligned against the human reference genome (hg19) for novel miRNA prediction and then against a custom reference sequence file containing miRBase v.20 known human miRNA hairpins, tRNA, rRNA, adapter sequences and predicted novel miRNA sequences.(Genome_build: hg19, miRBase v.20 human miRNA hairpins)

The miRDeep2 package (default parameters) was used to predict novel (as yet undescribed) miRNAs

Alignments with less than 17 bp matches and a custom 3′ end phred q-score threshold of 17 were filtered out.

miRNA quanitification was done using HTSeq v0.5.3p3 using the default union parameter.
Differential miRNA expression was analyzed using the DESeq (v.1.12.1) R/Bioconductor package

In this study, differentially expressed genes that had a false discovery rate cutoff at 10% (FDR< = 0.1), a log₂ fold change greater than 1.5 and less than −1.5 were considered significant.

Target gene prediction was performed using the TargetScan (version 6.2) database

We also used miRTarBase (version 4.3), to identify targets that have been experimentally validated

## miR-Deep2 and miReap ## predict exact precursor sequence according from mature sequence .

文章提到了fastq数据质量控制标准，数据比对工具，比对的参考基因组（两条比对线路），miRNA表达量的得到，新的miRNA预测，miRNA靶基因预测，这也是我们学习miRNA-seq的数据分析的标准套路，而且作者给出了所有的分析结果，我们完全可以通过自己的学习来重现他的分析过程。

Supplementary_files_format_and_content: tab-delimited text files containing raw read counts for known mature human miRNAs.（表达矩阵）

We detected 836 known human mature miRNAs in the control-CMs and 769 in the ET1-CMs

Based on our miRNA-Seq data, we predicted 506 sequences to be potentially novel, as yet undescribed miRNAs.

In order to validate the expression profiles of the miRNAs detected, we performed RT-qPCR on a subset of five known human mature and five of our predicted novel miRNAs.

we obtained a total of 1,922 predicted miRNA-mRNA pairs represented by 309 genes and 174 known mature human miRNAs. （）

当然仅仅是套路分析无法发文章的，所以他结合了 miRNA和mRNA 进行网络分析，还做了少量湿实验来验证，最后还扯了一些生物学意义，当然这种纯粹理论分析肯定不好扯什么治病救人的伟大理想。

下一篇我会讲自学miRNA-seq分析搜集到的参考资料

一	二	三	四	五	六	日
« 九
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

自学miRNA-seq分析第一讲~文献选择与解读