Cargando…
FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
MOTIVATION: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870905/ https://www.ncbi.nlm.nih.gov/pubmed/27663499 http://dx.doi.org/10.1093/bioinformatics/btw600 |
_version_ | 1783309560861163520 |
---|---|
author | Vachaspati, Pranjal Warnow, Tandy |
author_facet | Vachaspati, Pranjal Warnow, Tandy |
author_sort | Vachaspati, Pranjal |
collection | PubMed |
description | MOTIVATION: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. RESULTS: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species. AVAILABILITY AND IMPLEMENTATION: FastRFS is available on github at https://github.com/pranjalv123/FastRFS SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5870905 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58709052018-03-29 FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization Vachaspati, Pranjal Warnow, Tandy Bioinformatics Recomb-Cg 2016 MOTIVATION: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. RESULTS: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species. AVAILABILITY AND IMPLEMENTATION: FastRFS is available on github at https://github.com/pranjalv123/FastRFS SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-03-01 2016-09-23 /pmc/articles/PMC5870905/ /pubmed/27663499 http://dx.doi.org/10.1093/bioinformatics/btw600 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Recomb-Cg 2016 Vachaspati, Pranjal Warnow, Tandy FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization |
title | FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization |
title_full | FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization |
title_fullStr | FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization |
title_full_unstemmed | FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization |
title_short | FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization |
title_sort | fastrfs: fast and accurate robinson-foulds supertrees using constrained exact optimization |
topic | Recomb-Cg 2016 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870905/ https://www.ncbi.nlm.nih.gov/pubmed/27663499 http://dx.doi.org/10.1093/bioinformatics/btw600 |
work_keys_str_mv | AT vachaspatipranjal fastrfsfastandaccuraterobinsonfouldssupertreesusingconstrainedexactoptimization AT warnowtandy fastrfsfastandaccuraterobinsonfouldssupertreesusingconstrainedexactoptimization |