Cargando…

Significance evaluation in factor graphs

BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomi...

Descripción completa

Detalles Bibliográficos
Autores principales: Madsen, Tobias, Hobolth, Asger, Jensen, Jens Ledet, Pedersen, Jakob Skou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374669/
https://www.ncbi.nlm.nih.gov/pubmed/28359297
http://dx.doi.org/10.1186/s12859-017-1614-z
_version_ 1782518939339718656
author Madsen, Tobias
Hobolth, Asger
Jensen, Jens Ledet
Pedersen, Jakob Skou
author_facet Madsen, Tobias
Hobolth, Asger
Jensen, Jens Ledet
Pedersen, Jakob Skou
author_sort Madsen, Tobias
collection PubMed
description BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS: Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS: The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1614-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5374669
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53746692017-04-03 Significance evaluation in factor graphs Madsen, Tobias Hobolth, Asger Jensen, Jens Ledet Pedersen, Jakob Skou BMC Bioinformatics Methodology Article BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS: Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS: The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1614-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-31 /pmc/articles/PMC5374669/ /pubmed/28359297 http://dx.doi.org/10.1186/s12859-017-1614-z Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Madsen, Tobias
Hobolth, Asger
Jensen, Jens Ledet
Pedersen, Jakob Skou
Significance evaluation in factor graphs
title Significance evaluation in factor graphs
title_full Significance evaluation in factor graphs
title_fullStr Significance evaluation in factor graphs
title_full_unstemmed Significance evaluation in factor graphs
title_short Significance evaluation in factor graphs
title_sort significance evaluation in factor graphs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374669/
https://www.ncbi.nlm.nih.gov/pubmed/28359297
http://dx.doi.org/10.1186/s12859-017-1614-z
work_keys_str_mv AT madsentobias significanceevaluationinfactorgraphs
AT hobolthasger significanceevaluationinfactorgraphs
AT jensenjensledet significanceevaluationinfactorgraphs
AT pedersenjakobskou significanceevaluationinfactorgraphs