Cargando…

Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy

BACKGROUND: Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). W...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qiuyi, Rao, Satish, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364484/
https://www.ncbi.nlm.nih.gov/pubmed/30839943
http://dx.doi.org/10.1186/s13015-019-0136-9
_version_ 1783393288994160640
author Zhang, Qiuyi
Rao, Satish
Warnow, Tandy
author_facet Zhang, Qiuyi
Rao, Satish
Warnow, Tandy
author_sort Zhang, Qiuyi
collection PubMed
description BACKGROUND: Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was [Formula: see text] published in SODA 2001. The main empirical advantage of [Formula: see text] over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, [Formula: see text] is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. RESULTS: In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of [Formula: see text] but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time and space. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).
format Online
Article
Text
id pubmed-6364484
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63644842019-02-15 Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy Zhang, Qiuyi Rao, Satish Warnow, Tandy Algorithms Mol Biol Research BACKGROUND: Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was [Formula: see text] published in SODA 2001. The main empirical advantage of [Formula: see text] over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, [Formula: see text] is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. RESULTS: In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of [Formula: see text] but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time and space. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees). BioMed Central 2019-02-06 /pmc/articles/PMC6364484/ /pubmed/30839943 http://dx.doi.org/10.1186/s13015-019-0136-9 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhang, Qiuyi
Rao, Satish
Warnow, Tandy
Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
title Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
title_full Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
title_fullStr Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
title_full_unstemmed Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
title_short Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
title_sort constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364484/
https://www.ncbi.nlm.nih.gov/pubmed/30839943
http://dx.doi.org/10.1186/s13015-019-0136-9
work_keys_str_mv AT zhangqiuyi constrainedincrementaltreebuildingnewabsolutefastconvergingphylogenyestimationmethodswithimprovedscalabilityandaccuracy
AT raosatish constrainedincrementaltreebuildingnewabsolutefastconvergingphylogenyestimationmethodswithimprovedscalabilityandaccuracy
AT warnowtandy constrainedincrementaltreebuildingnewabsolutefastconvergingphylogenyestimationmethodswithimprovedscalabilityandaccuracy