Cargando…

Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo

Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if co...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinh, Vu, Darling, Aaron E, Matsen IV, Frederick A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5920340/
https://www.ncbi.nlm.nih.gov/pubmed/29244177
http://dx.doi.org/10.1093/sysbio/syx087
_version_ 1783317815793549312
author Dinh, Vu
Darling, Aaron E
Matsen IV, Frederick A
author_facet Dinh, Vu
Darling, Aaron E
Matsen IV, Frederick A
author_sort Dinh, Vu
collection PubMed
description Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an online Bayesian phylogenetic method which can update an existing posterior with new sequences. Here, we provide theoretical results on the consistency and stability of methods for online Bayesian phylogenetic inference based on Sequential Monte Carlo (SMC) and Markov chain Monte Carlo. We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles. Next, we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by bounding the effective sample size (ESS) with a given number of particles from below. We show that the ESS is guaranteed to grow linearly as the number of particles in an SMC sampler grows. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence.
format Online
Article
Text
id pubmed-5920340
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59203402018-05-04 Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo Dinh, Vu Darling, Aaron E Matsen IV, Frederick A Syst Biol Regular Articles Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an online Bayesian phylogenetic method which can update an existing posterior with new sequences. Here, we provide theoretical results on the consistency and stability of methods for online Bayesian phylogenetic inference based on Sequential Monte Carlo (SMC) and Markov chain Monte Carlo. We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles. Next, we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by bounding the effective sample size (ESS) with a given number of particles from below. We show that the ESS is guaranteed to grow linearly as the number of particles in an SMC sampler grows. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence. Oxford University Press 2018-05 2017-12-13 /pmc/articles/PMC5920340/ /pubmed/29244177 http://dx.doi.org/10.1093/sysbio/syx087 Text en © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Regular Articles
Dinh, Vu
Darling, Aaron E
Matsen IV, Frederick A
Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo
title Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo
title_full Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo
title_fullStr Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo
title_full_unstemmed Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo
title_short Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo
title_sort online bayesian phylogenetic inference: theoretical foundations via sequential monte carlo
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5920340/
https://www.ncbi.nlm.nih.gov/pubmed/29244177
http://dx.doi.org/10.1093/sysbio/syx087
work_keys_str_mv AT dinhvu onlinebayesianphylogeneticinferencetheoreticalfoundationsviasequentialmontecarlo
AT darlingaarone onlinebayesianphylogeneticinferencetheoreticalfoundationsviasequentialmontecarlo
AT matsenivfredericka onlinebayesianphylogeneticinferencetheoreticalfoundationsviasequentialmontecarlo