Cargando…

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

BACKGROUND: The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: nume...

Descripción completa

Detalles Bibliográficos
Autores principales:	Izquierdo-Carrasco, Fernando, Smith, Stephen A, Stamatakis, Alexandros
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3267785/ https://www.ncbi.nlm.nih.gov/pubmed/22165866 http://dx.doi.org/10.1186/1471-2105-12-470

_version_	1782222327169155072
author	Izquierdo-Carrasco, Fernando Smith, Stephen A Stamatakis, Alexandros
author_facet	Izquierdo-Carrasco, Fernando Smith, Stephen A Stamatakis, Alexandros
author_sort	Izquierdo-Carrasco, Fernando
collection	PubMed
description	BACKGROUND: The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood. RESULTS: We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems. CONCLUSIONS: We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.
format	Online Article Text
id	pubmed-3267785
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32677852012-01-30 Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees Izquierdo-Carrasco, Fernando Smith, Stephen A Stamatakis, Alexandros BMC Bioinformatics Research Article BACKGROUND: The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood. RESULTS: We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems. CONCLUSIONS: We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code. BioMed Central 2011-12-13 /pmc/articles/PMC3267785/ /pubmed/22165866 http://dx.doi.org/10.1186/1471-2105-12-470 Text en Copyright ©2011 Izquierdo-Carrasco et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Izquierdo-Carrasco, Fernando Smith, Stephen A Stamatakis, Alexandros Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_full	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_fullStr	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_full_unstemmed	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_short	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_sort	algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3267785/ https://www.ncbi.nlm.nih.gov/pubmed/22165866 http://dx.doi.org/10.1186/1471-2105-12-470
work_keys_str_mv	AT izquierdocarrascofernando algorithmsdatastructuresandnumericsforlikelihoodbasedphylogeneticinferenceofhugetrees AT smithstephena algorithmsdatastructuresandnumericsforlikelihoodbasedphylogeneticinferenceofhugetrees AT stamatakisalexandros algorithmsdatastructuresandnumericsforlikelihoodbasedphylogeneticinferenceofhugetrees

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

Ejemplares similares