Cargando…
Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited
MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a mea...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336443/ https://www.ncbi.nlm.nih.gov/pubmed/34252944 http://dx.doi.org/10.1093/bioinformatics/btab263 |
_version_ | 1783733320903819264 |
---|---|
author | Wang, Wei Hejasebazzi, Ahmad Zheng, Julia Liu, Kevin J |
author_facet | Wang, Wei Hejasebazzi, Ahmad Zheng, Julia Liu, Kevin J |
author_sort | Wang, Wei |
collection | PubMed |
description | MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate ‘phylogenetic support’). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted. RESULTS: In this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (‘RAndom Walk Resampling’). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the ‘mirrored inputs’ idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state-of-the-art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support. We also conduct a re-analysis of large-scale genomic sequence data from a recent study of Darwin’s finches. Our findings clarify phylogenetic uncertainty in a charismatic clade that serves as an important model for complex adaptive evolution. AVAILABILITY AND IMPLEMENTATION: Data and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/RAWR-study-datasets-and-scripts. |
format | Online Article Text |
id | pubmed-8336443 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83364432021-08-09 Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited Wang, Wei Hejasebazzi, Ahmad Zheng, Julia Liu, Kevin J Bioinformatics Evolutionary, Comparative and Population Genomics MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate ‘phylogenetic support’). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted. RESULTS: In this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (‘RAndom Walk Resampling’). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the ‘mirrored inputs’ idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state-of-the-art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support. We also conduct a re-analysis of large-scale genomic sequence data from a recent study of Darwin’s finches. Our findings clarify phylogenetic uncertainty in a charismatic clade that serves as an important model for complex adaptive evolution. AVAILABILITY AND IMPLEMENTATION: Data and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/RAWR-study-datasets-and-scripts. Oxford University Press 2021-07-12 /pmc/articles/PMC8336443/ /pubmed/34252944 http://dx.doi.org/10.1093/bioinformatics/btab263 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Evolutionary, Comparative and Population Genomics Wang, Wei Hejasebazzi, Ahmad Zheng, Julia Liu, Kevin J Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited |
title | Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited |
title_full | Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited |
title_fullStr | Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited |
title_full_unstemmed | Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited |
title_short | Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited |
title_sort | build a better bootstrap and the rawr shall beat a random path to your door: phylogenetic support estimation revisited |
topic | Evolutionary, Comparative and Population Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336443/ https://www.ncbi.nlm.nih.gov/pubmed/34252944 http://dx.doi.org/10.1093/bioinformatics/btab263 |
work_keys_str_mv | AT wangwei buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited AT hejasebazziahmad buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited AT zhengjulia buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited AT liukevinj buildabetterbootstrapandtherawrshallbeatarandompathtoyourdoorphylogeneticsupportestimationrevisited |