Cargando…

Using predictive specificity to determine when gene set analysis is biologically meaningful

Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper,...

Descripción completa

Detalles Bibliográficos
Autores principales: Ballouz, Sara, Pavlidis, Paul, Gillis, Jesse
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389513/
https://www.ncbi.nlm.nih.gov/pubmed/28204549
http://dx.doi.org/10.1093/nar/gkw957
_version_ 1782521281762033664
author Ballouz, Sara
Pavlidis, Paul
Gillis, Jesse
author_facet Ballouz, Sara
Pavlidis, Paul
Gillis, Jesse
author_sort Ballouz, Sara
collection PubMed
description Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated (‘multifunctional’) genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package.
format Online
Article
Text
id pubmed-5389513
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-53895132017-04-24 Using predictive specificity to determine when gene set analysis is biologically meaningful Ballouz, Sara Pavlidis, Paul Gillis, Jesse Nucleic Acids Res Methods Online Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated (‘multifunctional’) genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package. Oxford University Press 2017-02-28 2016-10-24 /pmc/articles/PMC5389513/ /pubmed/28204549 http://dx.doi.org/10.1093/nar/gkw957 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Ballouz, Sara
Pavlidis, Paul
Gillis, Jesse
Using predictive specificity to determine when gene set analysis is biologically meaningful
title Using predictive specificity to determine when gene set analysis is biologically meaningful
title_full Using predictive specificity to determine when gene set analysis is biologically meaningful
title_fullStr Using predictive specificity to determine when gene set analysis is biologically meaningful
title_full_unstemmed Using predictive specificity to determine when gene set analysis is biologically meaningful
title_short Using predictive specificity to determine when gene set analysis is biologically meaningful
title_sort using predictive specificity to determine when gene set analysis is biologically meaningful
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389513/
https://www.ncbi.nlm.nih.gov/pubmed/28204549
http://dx.doi.org/10.1093/nar/gkw957
work_keys_str_mv AT ballouzsara usingpredictivespecificitytodeterminewhengenesetanalysisisbiologicallymeaningful
AT pavlidispaul usingpredictivespecificitytodeterminewhengenesetanalysisisbiologicallymeaningful
AT gillisjesse usingpredictivespecificitytodeterminewhengenesetanalysisisbiologicallymeaningful