Cargando…

Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls

BACKGROUND: The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest developments in analysis tools for microarray and RNA-seq expression profiling. Such objective benchmarks are required for basic and applied research, and can be critical for clinical and...

Descripción completa

Detalles Bibliográficos
Autores principales: Łabaj, Paweł P., Kreil, David P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5168849/
https://www.ncbi.nlm.nih.gov/pubmed/27993156
http://dx.doi.org/10.1186/s13062-016-0169-7
Descripción
Sumario:BACKGROUND: The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest developments in analysis tools for microarray and RNA-seq expression profiling. Such objective benchmarks are required for basic and applied research, and can be critical for clinical and regulatory outcomes. Going beyond the first comparisons presented in the original SEQC study, we here present extended benchmarks including effect strengths typical of common experiments. RESULTS: With artefacts removed by factor analysis and additional filters, for genome scale surveys, the reproducibility of differential expression calls typically exceed 80% for all tool combinations examined. This directly reflects the robustness of results and reproducibility across different studies. Similar improvements are observed for the top ranked candidates with the strongest relative expression change, although here some tools clearly perform better than others, with typical reproducibility ranging from 60 to 93%. CONCLUSIONS: In our benchmark of alternative tools for RNA-seq data analysis we demonstrated the benefits that can be gained by analysing results in the context of other experiments employing a reference standard sample. This allowed the computational identification and removal of hidden confounders, for instance, by factor analysis. In itself, this already substantially improved the empirical False Discovery Rate (eFDR) without changing the overall landscape of sensitivity. Further filtering of false positives, however, is required to obtain acceptable eFDR levels. Appropriate filters noticeably improved agreement of differentially expressed genes both across sites and between alternative differential expression analysis pipelines. REVIEWERS: An extended abstract of this research paper was selected for the Camda Satellite Meeting to Ismb 2015 by the Camda Programme Committee. The full research paper then underwent one round of Open Peer Review under a responsible Camda Programme Committee member, Lan Hu, PhD (Bio-Rad Laboratories, Digital Biology Center-Cambridge). Open Peer Review was provided by Charlotte Soneson, PhD (University of Zürich) and Michał Okoniewski, PhD (ETH Zürich). The Reviewer Comments section shows the full reviews and author responses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-016-0169-7) contains supplementary material, which is available to authorized users.