Cargando…
Significance evaluation in factor graphs
BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374669/ https://www.ncbi.nlm.nih.gov/pubmed/28359297 http://dx.doi.org/10.1186/s12859-017-1614-z |
_version_ | 1782518939339718656 |
---|---|
author | Madsen, Tobias Hobolth, Asger Jensen, Jens Ledet Pedersen, Jakob Skou |
author_facet | Madsen, Tobias Hobolth, Asger Jensen, Jens Ledet Pedersen, Jakob Skou |
author_sort | Madsen, Tobias |
collection | PubMed |
description | BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS: Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS: The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1614-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5374669 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53746692017-04-03 Significance evaluation in factor graphs Madsen, Tobias Hobolth, Asger Jensen, Jens Ledet Pedersen, Jakob Skou BMC Bioinformatics Methodology Article BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS: Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS: The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1614-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-31 /pmc/articles/PMC5374669/ /pubmed/28359297 http://dx.doi.org/10.1186/s12859-017-1614-z Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Madsen, Tobias Hobolth, Asger Jensen, Jens Ledet Pedersen, Jakob Skou Significance evaluation in factor graphs |
title | Significance evaluation in factor graphs |
title_full | Significance evaluation in factor graphs |
title_fullStr | Significance evaluation in factor graphs |
title_full_unstemmed | Significance evaluation in factor graphs |
title_short | Significance evaluation in factor graphs |
title_sort | significance evaluation in factor graphs |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374669/ https://www.ncbi.nlm.nih.gov/pubmed/28359297 http://dx.doi.org/10.1186/s12859-017-1614-z |
work_keys_str_mv | AT madsentobias significanceevaluationinfactorgraphs AT hobolthasger significanceevaluationinfactorgraphs AT jensenjensledet significanceevaluationinfactorgraphs AT pedersenjakobskou significanceevaluationinfactorgraphs |