Cargando…
From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses
Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topol...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728795/ https://www.ncbi.nlm.nih.gov/pubmed/36395091 http://dx.doi.org/10.1093/molbev/msac254 |
_version_ | 1784845340390719488 |
---|---|
author | Haag, Julia Höhler, Dimitri Bettisworth, Ben Stamatakis, Alexandros |
author_facet | Haag, Julia Höhler, Dimitri Bettisworth, Ben Stamatakis, Alexandros |
author_sort | Haag, Julia |
collection | PubMed |
description | Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets. |
format | Online Article Text |
id | pubmed-9728795 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97287952022-12-08 From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses Haag, Julia Höhler, Dimitri Bettisworth, Ben Stamatakis, Alexandros Mol Biol Evol Discoveries Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets. Oxford University Press 2022-11-17 /pmc/articles/PMC9728795/ /pubmed/36395091 http://dx.doi.org/10.1093/molbev/msac254 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Discoveries Haag, Julia Höhler, Dimitri Bettisworth, Ben Stamatakis, Alexandros From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses |
title | From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses |
title_full | From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses |
title_fullStr | From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses |
title_full_unstemmed | From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses |
title_short | From Easy to Hopeless—Predicting the Difficulty of Phylogenetic Analyses |
title_sort | from easy to hopeless—predicting the difficulty of phylogenetic analyses |
topic | Discoveries |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728795/ https://www.ncbi.nlm.nih.gov/pubmed/36395091 http://dx.doi.org/10.1093/molbev/msac254 |
work_keys_str_mv | AT haagjulia fromeasytohopelesspredictingthedifficultyofphylogeneticanalyses AT hohlerdimitri fromeasytohopelesspredictingthedifficultyofphylogeneticanalyses AT bettisworthben fromeasytohopelesspredictingthedifficultyofphylogeneticanalyses AT stamatakisalexandros fromeasytohopelesspredictingthedifficultyofphylogeneticanalyses |