Cargando…

Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited

MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a mea...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Wei, Hejasebazzi, Ahmad, Zheng, Julia, Liu, Kevin J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336443/
https://www.ncbi.nlm.nih.gov/pubmed/34252944
http://dx.doi.org/10.1093/bioinformatics/btab263
_version_ 1783733320903819264
author Wang, Wei
Hejasebazzi, Ahmad
Zheng, Julia
Liu, Kevin J
author_facet Wang, Wei
Hejasebazzi, Ahmad
Zheng, Julia
Liu, Kevin J
author_sort Wang, Wei
collection PubMed
description MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate ‘phylogenetic support’). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted. RESULTS: In this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (‘RAndom Walk Resampling’). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the ‘mirrored inputs’ idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state-of-the-art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support. We also conduct a re-analysis of large-scale genomic sequence data from a recent study of Darwin’s finches. Our findings clarify phylogenetic uncertainty in a charismatic clade that serves as an important model for complex adaptive evolution. AVAILABILITY AND IMPLEMENTATION: Data and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/RAWR-study-datasets-and-scripts.
format Online
Article
Text
id pubmed-8336443
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83364432021-08-09 Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited Wang, Wei Hejasebazzi, Ahmad Zheng, Julia Liu, Kevin J Bioinformatics Evolutionary, Comparative and Population Genomics MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate ‘phylogenetic support’). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted. RESULTS: In this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (‘RAndom Walk Resampling’). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the ‘mirrored inputs’ idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state-of-the-art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support. We also conduct a re-analysis of large-scale genomic sequence data from a recent study of Darwin’s finches. Our findings clarify phylogenetic uncertainty in a charismatic clade that serves as an important model for complex adaptive evolution. AVAILABILITY AND IMPLEMENTATION: Data and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/RAWR-study-datasets-and-scripts. Oxford University Press 2021-07-12 /pmc/articles/PMC8336443/ /pubmed/34252944 http://dx.doi.org/10.1093/bioinformatics/btab263 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Evolutionary, Comparative and Population Genomics
Wang, Wei
Hejasebazzi, Ahmad
Zheng, Julia
Liu, Kevin J
Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
title Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
title_full Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
title_fullStr Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
title_full_unstemmed Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
title_short Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
title_sort build a better bootstrap and the rawr shall beat a random path to your door: phylogenetic support estimation revisited
topic Evolutionary, Comparative and Population Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336443/
https://www.ncbi.nlm.nih.gov/pubmed/34252944
http://dx.doi.org/10.1093/bioinformatics/btab263
work_keys_str_mv AT wangwei buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited
AT hejasebazziahmad buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited
AT zhengjulia buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited
AT liukevinj buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited