Cargando…
Toward a gold standard for benchmarking gene set enrichment analysis
MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820859/ https://www.ncbi.nlm.nih.gov/pubmed/32026945 http://dx.doi.org/10.1093/bib/bbz158 |
_version_ | 1783639299552444416 |
---|---|
author | Geistlinger, Ludwig Csaba, Gergely Santarelli, Mara Ramos, Marcel Schiffer, Lucas Turaga, Nitesh Law, Charity Davis, Sean Carey, Vincent Morgan, Martin Zimmer, Ralf Waldron, Levi |
author_facet | Geistlinger, Ludwig Csaba, Gergely Santarelli, Mara Ramos, Marcel Schiffer, Lucas Turaga, Nitesh Law, Charity Davis, Sean Carey, Vincent Morgan, Martin Zimmer, Ralf Waldron, Levi |
author_sort | Geistlinger, Ludwig |
collection | PubMed |
description | MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR CONTACT: ludwig.geistlinger@sph.cuny.edu |
format | Online Article Text |
id | pubmed-7820859 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-78208592021-01-27 Toward a gold standard for benchmarking gene set enrichment analysis Geistlinger, Ludwig Csaba, Gergely Santarelli, Mara Ramos, Marcel Schiffer, Lucas Turaga, Nitesh Law, Charity Davis, Sean Carey, Vincent Morgan, Martin Zimmer, Ralf Waldron, Levi Brief Bioinform Problem Solving Protocol MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR CONTACT: ludwig.geistlinger@sph.cuny.edu Oxford University Press 2020-03-09 /pmc/articles/PMC7820859/ /pubmed/32026945 http://dx.doi.org/10.1093/bib/bbz158 Text en © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Geistlinger, Ludwig Csaba, Gergely Santarelli, Mara Ramos, Marcel Schiffer, Lucas Turaga, Nitesh Law, Charity Davis, Sean Carey, Vincent Morgan, Martin Zimmer, Ralf Waldron, Levi Toward a gold standard for benchmarking gene set enrichment analysis |
title | Toward a gold standard for benchmarking gene set enrichment analysis |
title_full | Toward a gold standard for benchmarking gene set enrichment analysis |
title_fullStr | Toward a gold standard for benchmarking gene set enrichment analysis |
title_full_unstemmed | Toward a gold standard for benchmarking gene set enrichment analysis |
title_short | Toward a gold standard for benchmarking gene set enrichment analysis |
title_sort | toward a gold standard for benchmarking gene set enrichment analysis |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820859/ https://www.ncbi.nlm.nih.gov/pubmed/32026945 http://dx.doi.org/10.1093/bib/bbz158 |
work_keys_str_mv | AT geistlingerludwig towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT csabagergely towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT santarellimara towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT ramosmarcel towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT schifferlucas towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT turaganitesh towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT lawcharity towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT davissean towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT careyvincent towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT morganmartin towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT zimmerralf towardagoldstandardforbenchmarkinggenesetenrichmentanalysis AT waldronlevi towardagoldstandardforbenchmarkinggenesetenrichmentanalysis |