Cargando…

Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms

MOTIVATION: Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests...

Descripción completa

Detalles Bibliográficos
Autores principales: Zyla, Joanna, Marczyk, Michal, Domaszewska, Teresa, Kaufmann, Stefan H E, Polanska, Joanna, Weiner, January
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954644/
https://www.ncbi.nlm.nih.gov/pubmed/31165139
http://dx.doi.org/10.1093/bioinformatics/btz447
_version_ 1783486838726459392
author Zyla, Joanna
Marczyk, Michal
Domaszewska, Teresa
Kaufmann, Stefan H E
Polanska, Joanna
Weiner, January
author_facet Zyla, Joanna
Marczyk, Michal
Domaszewska, Teresa
Kaufmann, Stefan H E
Polanska, Joanna
Weiner, January
author_sort Zyla, Joanna
collection PubMed
description MOTIVATION: Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. RESULTS: We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. AVAILABILITY AND IMPLEMENTATION: tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6954644
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69546442020-01-16 Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms Zyla, Joanna Marczyk, Michal Domaszewska, Teresa Kaufmann, Stefan H E Polanska, Joanna Weiner, January Bioinformatics Original Papers MOTIVATION: Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. RESULTS: We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. AVAILABILITY AND IMPLEMENTATION: tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-12-15 2019-06-04 /pmc/articles/PMC6954644/ /pubmed/31165139 http://dx.doi.org/10.1093/bioinformatics/btz447 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Zyla, Joanna
Marczyk, Michal
Domaszewska, Teresa
Kaufmann, Stefan H E
Polanska, Joanna
Weiner, January
Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
title Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
title_full Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
title_fullStr Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
title_full_unstemmed Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
title_short Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
title_sort gene set enrichment for reproducible science: comparison of cerno and eight other algorithms
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954644/
https://www.ncbi.nlm.nih.gov/pubmed/31165139
http://dx.doi.org/10.1093/bioinformatics/btz447
work_keys_str_mv AT zylajoanna genesetenrichmentforreproduciblesciencecomparisonofcernoandeightotheralgorithms
AT marczykmichal genesetenrichmentforreproduciblesciencecomparisonofcernoandeightotheralgorithms
AT domaszewskateresa genesetenrichmentforreproduciblesciencecomparisonofcernoandeightotheralgorithms
AT kaufmannstefanhe genesetenrichmentforreproduciblesciencecomparisonofcernoandeightotheralgorithms
AT polanskajoanna genesetenrichmentforreproduciblesciencecomparisonofcernoandeightotheralgorithms
AT weinerjanuary genesetenrichmentforreproduciblesciencecomparisonofcernoandeightotheralgorithms