Cargando…

Information geometry for phylogenetic trees

We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce simi...

Descripción completa

Detalles Bibliográficos
Autores principales: Garba, M. K., Nye, T. M. W., Lueg, J., Huckemann, S. F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7884381/
https://www.ncbi.nlm.nih.gov/pubmed/33590321
http://dx.doi.org/10.1007/s00285-021-01553-x
_version_ 1783651403035574272
author Garba, M. K.
Nye, T. M. W.
Lueg, J.
Huckemann, S. F.
author_facet Garba, M. K.
Nye, T. M. W.
Lueg, J.
Huckemann, S. F.
author_sort Garba, M. K.
collection PubMed
description We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera–Holmes–Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback–Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.
format Online
Article
Text
id pubmed-7884381
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-78843812021-02-25 Information geometry for phylogenetic trees Garba, M. K. Nye, T. M. W. Lueg, J. Huckemann, S. F. J Math Biol Article We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera–Holmes–Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback–Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different. Springer Berlin Heidelberg 2021-02-15 2021 /pmc/articles/PMC7884381/ /pubmed/33590321 http://dx.doi.org/10.1007/s00285-021-01553-x Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Garba, M. K.
Nye, T. M. W.
Lueg, J.
Huckemann, S. F.
Information geometry for phylogenetic trees
title Information geometry for phylogenetic trees
title_full Information geometry for phylogenetic trees
title_fullStr Information geometry for phylogenetic trees
title_full_unstemmed Information geometry for phylogenetic trees
title_short Information geometry for phylogenetic trees
title_sort information geometry for phylogenetic trees
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7884381/
https://www.ncbi.nlm.nih.gov/pubmed/33590321
http://dx.doi.org/10.1007/s00285-021-01553-x
work_keys_str_mv AT garbamk informationgeometryforphylogenetictrees
AT nyetmw informationgeometryforphylogenetictrees
AT luegj informationgeometryforphylogenetictrees
AT huckemannsf informationgeometryforphylogenetictrees