Cargando…

Toward a gold standard for benchmarking gene set enrichment analysis

MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and...

Descripción completa

Detalles Bibliográficos
Autores principales: Geistlinger, Ludwig, Csaba, Gergely, Santarelli, Mara, Ramos, Marcel, Schiffer, Lucas, Turaga, Nitesh, Law, Charity, Davis, Sean, Carey, Vincent, Morgan, Martin, Zimmer, Ralf, Waldron, Levi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820859/
https://www.ncbi.nlm.nih.gov/pubmed/32026945
http://dx.doi.org/10.1093/bib/bbz158
_version_ 1783639299552444416
author Geistlinger, Ludwig
Csaba, Gergely
Santarelli, Mara
Ramos, Marcel
Schiffer, Lucas
Turaga, Nitesh
Law, Charity
Davis, Sean
Carey, Vincent
Morgan, Martin
Zimmer, Ralf
Waldron, Levi
author_facet Geistlinger, Ludwig
Csaba, Gergely
Santarelli, Mara
Ramos, Marcel
Schiffer, Lucas
Turaga, Nitesh
Law, Charity
Davis, Sean
Carey, Vincent
Morgan, Martin
Zimmer, Ralf
Waldron, Levi
author_sort Geistlinger, Ludwig
collection PubMed
description MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR CONTACT: ludwig.geistlinger@sph.cuny.edu
format Online
Article
Text
id pubmed-7820859
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78208592021-01-27 Toward a gold standard for benchmarking gene set enrichment analysis Geistlinger, Ludwig Csaba, Gergely Santarelli, Mara Ramos, Marcel Schiffer, Lucas Turaga, Nitesh Law, Charity Davis, Sean Carey, Vincent Morgan, Martin Zimmer, Ralf Waldron, Levi Brief Bioinform Problem Solving Protocol MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR CONTACT: ludwig.geistlinger@sph.cuny.edu Oxford University Press 2020-03-09 /pmc/articles/PMC7820859/ /pubmed/32026945 http://dx.doi.org/10.1093/bib/bbz158 Text en © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Geistlinger, Ludwig
Csaba, Gergely
Santarelli, Mara
Ramos, Marcel
Schiffer, Lucas
Turaga, Nitesh
Law, Charity
Davis, Sean
Carey, Vincent
Morgan, Martin
Zimmer, Ralf
Waldron, Levi
Toward a gold standard for benchmarking gene set enrichment analysis
title Toward a gold standard for benchmarking gene set enrichment analysis
title_full Toward a gold standard for benchmarking gene set enrichment analysis
title_fullStr Toward a gold standard for benchmarking gene set enrichment analysis
title_full_unstemmed Toward a gold standard for benchmarking gene set enrichment analysis
title_short Toward a gold standard for benchmarking gene set enrichment analysis
title_sort toward a gold standard for benchmarking gene set enrichment analysis
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820859/
https://www.ncbi.nlm.nih.gov/pubmed/32026945
http://dx.doi.org/10.1093/bib/bbz158
work_keys_str_mv AT geistlingerludwig towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT csabagergely towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT santarellimara towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT ramosmarcel towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT schifferlucas towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT turaganitesh towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT lawcharity towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT davissean towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT careyvincent towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT morganmartin towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT zimmerralf towardagoldstandardforbenchmarkinggenesetenrichmentanalysis
AT waldronlevi towardagoldstandardforbenchmarkinggenesetenrichmentanalysis