Cargando…

Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution

BACKGROUND: Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each mode...

Descripción completa

Detalles Bibliográficos
Autores principales: Baele, Guy, Lemey, Philippe, Vansteelandt, Stijn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651733/
https://www.ncbi.nlm.nih.gov/pubmed/23497171
http://dx.doi.org/10.1186/1471-2105-14-85
_version_ 1782269239199006720
author Baele, Guy
Lemey, Philippe
Vansteelandt, Stijn
author_facet Baele, Guy
Lemey, Philippe
Vansteelandt, Stijn
author_sort Baele, Guy
collection PubMed
description BACKGROUND: Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. RESULTS: We here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. CONCLUSIONS: We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
format Online
Article
Text
id pubmed-3651733
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36517332013-05-14 Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution Baele, Guy Lemey, Philippe Vansteelandt, Stijn BMC Bioinformatics Methodology Article BACKGROUND: Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. RESULTS: We here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. CONCLUSIONS: We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation. BioMed Central 2013-03-06 /pmc/articles/PMC3651733/ /pubmed/23497171 http://dx.doi.org/10.1186/1471-2105-14-85 Text en Copyright © 2013 Baele et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Baele, Guy
Lemey, Philippe
Vansteelandt, Stijn
Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
title Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
title_full Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
title_fullStr Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
title_full_unstemmed Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
title_short Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution
title_sort make the most of your samples: bayes factor estimators for high-dimensional models of sequence evolution
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651733/
https://www.ncbi.nlm.nih.gov/pubmed/23497171
http://dx.doi.org/10.1186/1471-2105-14-85
work_keys_str_mv AT baeleguy makethemostofyoursamplesbayesfactorestimatorsforhighdimensionalmodelsofsequenceevolution
AT lemeyphilippe makethemostofyoursamplesbayesfactorestimatorsforhighdimensionalmodelsofsequenceevolution
AT vansteelandtstijn makethemostofyoursamplesbayesfactorestimatorsforhighdimensionalmodelsofsequenceevolution