Cargando…

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge

BACKGROUND: Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datase...

Descripción completa

Detalles Bibliográficos
Autores principales:	Molloy, Erin K., Warnow, Tandy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642500/ https://www.ncbi.nlm.nih.gov/pubmed/31360216 http://dx.doi.org/10.1186/s13015-019-0151-x

_version_	1783436986330120192
author	Molloy, Erin K. Warnow, Tandy
author_facet	Molloy, Erin K. Warnow, Tandy
author_sort	Molloy, Erin K.
collection	PubMed
description	BACKGROUND: Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches. RESULTS: In this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and “concatenation” using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases. CONCLUSIONS: Theoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-019-0151-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6642500
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66425002019-07-29 Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge Molloy, Erin K. Warnow, Tandy Algorithms Mol Biol Research BACKGROUND: Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches. RESULTS: In this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and “concatenation” using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases. CONCLUSIONS: Theoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-019-0151-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-19 /pmc/articles/PMC6642500/ /pubmed/31360216 http://dx.doi.org/10.1186/s13015-019-0151-x Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Molloy, Erin K. Warnow, Tandy Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge
title	Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge
title_full	Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge
title_fullStr	Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge
title_full_unstemmed	Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge
title_short	Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge
title_sort	statistically consistent divide-and-conquer pipelines for phylogeny estimation using njmerge
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642500/ https://www.ncbi.nlm.nih.gov/pubmed/31360216 http://dx.doi.org/10.1186/s13015-019-0151-x
work_keys_str_mv	AT molloyerink statisticallyconsistentdivideandconquerpipelinesforphylogenyestimationusingnjmerge AT warnowtandy statisticallyconsistentdivideandconquerpipelinesforphylogenyestimationusingnjmerge

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge

Ejemplares similares