Cargando…
Using predictive specificity to determine when gene set analysis is biologically meaningful
Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389513/ https://www.ncbi.nlm.nih.gov/pubmed/28204549 http://dx.doi.org/10.1093/nar/gkw957 |
_version_ | 1782521281762033664 |
---|---|
author | Ballouz, Sara Pavlidis, Paul Gillis, Jesse |
author_facet | Ballouz, Sara Pavlidis, Paul Gillis, Jesse |
author_sort | Ballouz, Sara |
collection | PubMed |
description | Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated (‘multifunctional’) genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package. |
format | Online Article Text |
id | pubmed-5389513 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-53895132017-04-24 Using predictive specificity to determine when gene set analysis is biologically meaningful Ballouz, Sara Pavlidis, Paul Gillis, Jesse Nucleic Acids Res Methods Online Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated (‘multifunctional’) genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package. Oxford University Press 2017-02-28 2016-10-24 /pmc/articles/PMC5389513/ /pubmed/28204549 http://dx.doi.org/10.1093/nar/gkw957 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Ballouz, Sara Pavlidis, Paul Gillis, Jesse Using predictive specificity to determine when gene set analysis is biologically meaningful |
title | Using predictive specificity to determine when gene set analysis is biologically meaningful |
title_full | Using predictive specificity to determine when gene set analysis is biologically meaningful |
title_fullStr | Using predictive specificity to determine when gene set analysis is biologically meaningful |
title_full_unstemmed | Using predictive specificity to determine when gene set analysis is biologically meaningful |
title_short | Using predictive specificity to determine when gene set analysis is biologically meaningful |
title_sort | using predictive specificity to determine when gene set analysis is biologically meaningful |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389513/ https://www.ncbi.nlm.nih.gov/pubmed/28204549 http://dx.doi.org/10.1093/nar/gkw957 |
work_keys_str_mv | AT ballouzsara usingpredictivespecificitytodeterminewhengenesetanalysisisbiologicallymeaningful AT pavlidispaul usingpredictivespecificitytodeterminewhengenesetanalysisisbiologicallymeaningful AT gillisjesse usingpredictivespecificitytodeterminewhengenesetanalysisisbiologicallymeaningful |