Cargando…

FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization

MOTIVATION: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on...

Descripción completa

Detalles Bibliográficos
Autores principales: Vachaspati, Pranjal, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870905/
https://www.ncbi.nlm.nih.gov/pubmed/27663499
http://dx.doi.org/10.1093/bioinformatics/btw600
_version_ 1783309560861163520
author Vachaspati, Pranjal
Warnow, Tandy
author_facet Vachaspati, Pranjal
Warnow, Tandy
author_sort Vachaspati, Pranjal
collection PubMed
description MOTIVATION: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. RESULTS: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species. AVAILABILITY AND IMPLEMENTATION: FastRFS is available on github at https://github.com/pranjalv123/FastRFS SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870905
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58709052018-03-29 FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization Vachaspati, Pranjal Warnow, Tandy Bioinformatics Recomb-Cg 2016 MOTIVATION: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. RESULTS: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species. AVAILABILITY AND IMPLEMENTATION: FastRFS is available on github at https://github.com/pranjalv123/FastRFS SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-03-01 2016-09-23 /pmc/articles/PMC5870905/ /pubmed/27663499 http://dx.doi.org/10.1093/bioinformatics/btw600 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Recomb-Cg 2016
Vachaspati, Pranjal
Warnow, Tandy
FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
title FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
title_full FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
title_fullStr FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
title_full_unstemmed FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
title_short FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
title_sort fastrfs: fast and accurate robinson-foulds supertrees using constrained exact optimization
topic Recomb-Cg 2016
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870905/
https://www.ncbi.nlm.nih.gov/pubmed/27663499
http://dx.doi.org/10.1093/bioinformatics/btw600
work_keys_str_mv AT vachaspatipranjal fastrfsfastandaccuraterobinsonfouldssupertreesusingconstrainedexactoptimization
AT warnowtandy fastrfsfastandaccuraterobinsonfouldssupertreesusingconstrainedexactoptimization