Cargando…

Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls

BACKGROUND: The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest developments in analysis tools for microarray and RNA-seq expression profiling. Such objective benchmarks are required for basic and applied research, and can be critical for clinical and...

Descripción completa

Detalles Bibliográficos
Autores principales: Łabaj, Paweł P., Kreil, David P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5168849/
https://www.ncbi.nlm.nih.gov/pubmed/27993156
http://dx.doi.org/10.1186/s13062-016-0169-7
_version_ 1782483424963985408
author Łabaj, Paweł P.
Kreil, David P.
author_facet Łabaj, Paweł P.
Kreil, David P.
author_sort Łabaj, Paweł P.
collection PubMed
description BACKGROUND: The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest developments in analysis tools for microarray and RNA-seq expression profiling. Such objective benchmarks are required for basic and applied research, and can be critical for clinical and regulatory outcomes. Going beyond the first comparisons presented in the original SEQC study, we here present extended benchmarks including effect strengths typical of common experiments. RESULTS: With artefacts removed by factor analysis and additional filters, for genome scale surveys, the reproducibility of differential expression calls typically exceed 80% for all tool combinations examined. This directly reflects the robustness of results and reproducibility across different studies. Similar improvements are observed for the top ranked candidates with the strongest relative expression change, although here some tools clearly perform better than others, with typical reproducibility ranging from 60 to 93%. CONCLUSIONS: In our benchmark of alternative tools for RNA-seq data analysis we demonstrated the benefits that can be gained by analysing results in the context of other experiments employing a reference standard sample. This allowed the computational identification and removal of hidden confounders, for instance, by factor analysis. In itself, this already substantially improved the empirical False Discovery Rate (eFDR) without changing the overall landscape of sensitivity. Further filtering of false positives, however, is required to obtain acceptable eFDR levels. Appropriate filters noticeably improved agreement of differentially expressed genes both across sites and between alternative differential expression analysis pipelines. REVIEWERS: An extended abstract of this research paper was selected for the Camda Satellite Meeting to Ismb 2015 by the Camda Programme Committee. The full research paper then underwent one round of Open Peer Review under a responsible Camda Programme Committee member, Lan Hu, PhD (Bio-Rad Laboratories, Digital Biology Center-Cambridge). Open Peer Review was provided by Charlotte Soneson, PhD (University of Zürich) and Michał Okoniewski, PhD (ETH Zürich). The Reviewer Comments section shows the full reviews and author responses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-016-0169-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5168849
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51688492016-12-28 Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls Łabaj, Paweł P. Kreil, David P. Biol Direct Research BACKGROUND: The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest developments in analysis tools for microarray and RNA-seq expression profiling. Such objective benchmarks are required for basic and applied research, and can be critical for clinical and regulatory outcomes. Going beyond the first comparisons presented in the original SEQC study, we here present extended benchmarks including effect strengths typical of common experiments. RESULTS: With artefacts removed by factor analysis and additional filters, for genome scale surveys, the reproducibility of differential expression calls typically exceed 80% for all tool combinations examined. This directly reflects the robustness of results and reproducibility across different studies. Similar improvements are observed for the top ranked candidates with the strongest relative expression change, although here some tools clearly perform better than others, with typical reproducibility ranging from 60 to 93%. CONCLUSIONS: In our benchmark of alternative tools for RNA-seq data analysis we demonstrated the benefits that can be gained by analysing results in the context of other experiments employing a reference standard sample. This allowed the computational identification and removal of hidden confounders, for instance, by factor analysis. In itself, this already substantially improved the empirical False Discovery Rate (eFDR) without changing the overall landscape of sensitivity. Further filtering of false positives, however, is required to obtain acceptable eFDR levels. Appropriate filters noticeably improved agreement of differentially expressed genes both across sites and between alternative differential expression analysis pipelines. REVIEWERS: An extended abstract of this research paper was selected for the Camda Satellite Meeting to Ismb 2015 by the Camda Programme Committee. The full research paper then underwent one round of Open Peer Review under a responsible Camda Programme Committee member, Lan Hu, PhD (Bio-Rad Laboratories, Digital Biology Center-Cambridge). Open Peer Review was provided by Charlotte Soneson, PhD (University of Zürich) and Michał Okoniewski, PhD (ETH Zürich). The Reviewer Comments section shows the full reviews and author responses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-016-0169-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-20 /pmc/articles/PMC5168849/ /pubmed/27993156 http://dx.doi.org/10.1186/s13062-016-0169-7 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Łabaj, Paweł P.
Kreil, David P.
Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls
title Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls
title_full Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls
title_fullStr Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls
title_full_unstemmed Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls
title_short Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls
title_sort sensitivity, specificity, and reproducibility of rna-seq differential expression calls
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5168849/
https://www.ncbi.nlm.nih.gov/pubmed/27993156
http://dx.doi.org/10.1186/s13062-016-0169-7
work_keys_str_mv AT łabajpawełp sensitivityspecificityandreproducibilityofrnaseqdifferentialexpressioncalls
AT kreildavidp sensitivityspecificityandreproducibilityofrnaseqdifferentialexpressioncalls