Cargando…

Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty

Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multipl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Togkousidis, Anastasis, Kozlov, Oleksiy M, Haag, Julia, Höhler, Dimitri, Stamatakis, Alexandros
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584362/ https://www.ncbi.nlm.nih.gov/pubmed/37804116 http://dx.doi.org/10.1093/molbev/msad227

_version_	1785122722993405952
author	Togkousidis, Anastasis Kozlov, Oleksiy M Haag, Julia Höhler, Dimitri Stamatakis, Alexandros
author_facet	Togkousidis, Anastasis Kozlov, Oleksiy M Haag, Julia Höhler, Dimitri Stamatakis, Alexandros
author_sort	Togkousidis, Anastasis
collection	PubMed
description	Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets ([Formula: see text] of total MSAs), where we observe average speedups of more than [Formula: see text]. Further, approximately [Formula: see text] of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).
format	Online Article Text
id	pubmed-10584362
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-105843622023-10-19 Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty Togkousidis, Anastasis Kozlov, Oleksiy M Haag, Julia Höhler, Dimitri Stamatakis, Alexandros Mol Biol Evol Methods Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets ([Formula: see text] of total MSAs), where we observe average speedups of more than [Formula: see text]. Further, approximately [Formula: see text] of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG). Oxford University Press 2023-10-06 /pmc/articles/PMC10584362/ /pubmed/37804116 http://dx.doi.org/10.1093/molbev/msad227 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Methods Togkousidis, Anastasis Kozlov, Oleksiy M Haag, Julia Höhler, Dimitri Stamatakis, Alexandros Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
title	Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
title_full	Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
title_fullStr	Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
title_full_unstemmed	Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
title_short	Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
title_sort	adaptive raxml-ng: accelerating phylogenetic inference under maximum likelihood using dataset difficulty
topic	Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584362/ https://www.ncbi.nlm.nih.gov/pubmed/37804116 http://dx.doi.org/10.1093/molbev/msad227
work_keys_str_mv	AT togkousidisanastasis adaptiveraxmlngacceleratingphylogeneticinferenceundermaximumlikelihoodusingdatasetdifficulty AT kozlovoleksiym adaptiveraxmlngacceleratingphylogeneticinferenceundermaximumlikelihoodusingdatasetdifficulty AT haagjulia adaptiveraxmlngacceleratingphylogeneticinferenceundermaximumlikelihoodusingdatasetdifficulty AT hohlerdimitri adaptiveraxmlngacceleratingphylogeneticinferenceundermaximumlikelihoodusingdatasetdifficulty AT stamatakisalexandros adaptiveraxmlngacceleratingphylogeneticinferenceundermaximumlikelihoodusingdatasetdifficulty

Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty

Ejemplares similares