Cargando…

DACTAL: divide-and-conquer trees (almost) without alignments

Motivation: While phylogenetic analyses of datasets containing 1000–5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produce...

Descripción completa

Detalles Bibliográficos
Autores principales: Nelesen, Serita, Liu, Kevin, Wang, Li-San, Linder, C. Randal, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371850/
https://www.ncbi.nlm.nih.gov/pubmed/22689772
http://dx.doi.org/10.1093/bioinformatics/bts218
_version_ 1782235270232408064
author Nelesen, Serita
Liu, Kevin
Wang, Li-San
Linder, C. Randal
Warnow, Tandy
author_facet Nelesen, Serita
Liu, Kevin
Wang, Li-San
Linder, C. Randal
Warnow, Tandy
author_sort Nelesen, Serita
collection PubMed
description Motivation: While phylogenetic analyses of datasets containing 1000–5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divide-and-conquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATé and maximum likelihood trees on estimated alignments using simulated and real datasets with 1000–27 643 taxa. Results: Our studies show that on average DACTAL yields more accurate trees than the two-phase methods we studied on very large datasets that are difficult to align, and has approximately the same accuracy on the easier datasets. The comparison to SATé shows that both have the same accuracy, but that DACTAL achieves this accuracy in a fraction of the time. Furthermore, DACTAL can analyze larger datasets than SATé, including a dataset with almost 28 000 sequences. Availability: DACTAL source code and results of dataset analyses are available at www.cs.utexas.edu/users/phylo/software/dactal. Contact: tandy@cs.utexas.edu
format Online
Article
Text
id pubmed-3371850
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33718502012-06-11 DACTAL: divide-and-conquer trees (almost) without alignments Nelesen, Serita Liu, Kevin Wang, Li-San Linder, C. Randal Warnow, Tandy Bioinformatics Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa Motivation: While phylogenetic analyses of datasets containing 1000–5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divide-and-conquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATé and maximum likelihood trees on estimated alignments using simulated and real datasets with 1000–27 643 taxa. Results: Our studies show that on average DACTAL yields more accurate trees than the two-phase methods we studied on very large datasets that are difficult to align, and has approximately the same accuracy on the easier datasets. The comparison to SATé shows that both have the same accuracy, but that DACTAL achieves this accuracy in a fraction of the time. Furthermore, DACTAL can analyze larger datasets than SATé, including a dataset with almost 28 000 sequences. Availability: DACTAL source code and results of dataset analyses are available at www.cs.utexas.edu/users/phylo/software/dactal. Contact: tandy@cs.utexas.edu Oxford University Press 2012-06-15 2012-06-09 /pmc/articles/PMC3371850/ /pubmed/22689772 http://dx.doi.org/10.1093/bioinformatics/bts218 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
Nelesen, Serita
Liu, Kevin
Wang, Li-San
Linder, C. Randal
Warnow, Tandy
DACTAL: divide-and-conquer trees (almost) without alignments
title DACTAL: divide-and-conquer trees (almost) without alignments
title_full DACTAL: divide-and-conquer trees (almost) without alignments
title_fullStr DACTAL: divide-and-conquer trees (almost) without alignments
title_full_unstemmed DACTAL: divide-and-conquer trees (almost) without alignments
title_short DACTAL: divide-and-conquer trees (almost) without alignments
title_sort dactal: divide-and-conquer trees (almost) without alignments
topic Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371850/
https://www.ncbi.nlm.nih.gov/pubmed/22689772
http://dx.doi.org/10.1093/bioinformatics/bts218
work_keys_str_mv AT nelesenserita dactaldivideandconquertreesalmostwithoutalignments
AT liukevin dactaldivideandconquertreesalmostwithoutalignments
AT wanglisan dactaldivideandconquertreesalmostwithoutalignments
AT lindercrandal dactaldivideandconquertreesalmostwithoutalignments
AT warnowtandy dactaldivideandconquertreesalmostwithoutalignments