Cargando…

Distance-Based Phylogenetic Placement with Statistical Support

SIMPLE SUMMARY: Phylogenetic placement seeks to find the optimal position for a new query species on an existing backbone tree. Fast and accurate distance-based phylogenetic placement methods lack the crucial feature of estimating the support values for various placements of a query sequence. This s...

Descripción completa

Detalles Bibliográficos
Autores principales: Hasan, Navid Bin, Balaban, Metin, Biswas, Avijit, Bayzid, Md. Shamsuzzoha, Mirarab, Siavash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404983/
https://www.ncbi.nlm.nih.gov/pubmed/36009839
http://dx.doi.org/10.3390/biology11081212
_version_ 1784773768063746048
author Hasan, Navid Bin
Balaban, Metin
Biswas, Avijit
Bayzid, Md. Shamsuzzoha
Mirarab, Siavash
author_facet Hasan, Navid Bin
Balaban, Metin
Biswas, Avijit
Bayzid, Md. Shamsuzzoha
Mirarab, Siavash
author_sort Hasan, Navid Bin
collection PubMed
description SIMPLE SUMMARY: Phylogenetic placement seeks to find the optimal position for a new query species on an existing backbone tree. Fast and accurate distance-based phylogenetic placement methods lack the crucial feature of estimating the support values for various placements of a query sequence. This study presents both parametric and nonparametric methods for measuring the support values of distance-based phylogenetic placements. ABSTRACT: Phylogenetic identification of unknown sequences by placing them on a tree is routinely attempted in modern ecological studies. Such placements are often obtained from incomplete and noisy data, making it essential to augment the results with some notion of uncertainty. While the standard likelihood-based methods designed for placement naturally provide such measures of uncertainty, the newer and more scalable distance-based methods lack this crucial feature. Here, we adopt several parametric and nonparametric sampling methods for measuring the support of phylogenetic placements that have been obtained with the use of distances. Comparing the alternative strategies, we conclude that nonparametric bootstrapping is more accurate than the alternatives. We go on to show how bootstrapping can be performed efficiently using a linear algebraic formulation that makes it up to 30 times faster and implement this optimized version as part of the distance-based placement software APPLES. By examining a wide range of applications, we show that the relative accuracy of maximum likelihood (ML) support values as compared to distance-based methods depends on the application and the dataset. ML is advantageous for fragmentary queries, while distance-based support values are more accurate for full-length and multi-gene datasets. With the quantification of uncertainty, our work fills a crucial gap that prevents the broader adoption of distance-based placement tools.
format Online
Article
Text
id pubmed-9404983
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94049832022-08-26 Distance-Based Phylogenetic Placement with Statistical Support Hasan, Navid Bin Balaban, Metin Biswas, Avijit Bayzid, Md. Shamsuzzoha Mirarab, Siavash Biology (Basel) Article SIMPLE SUMMARY: Phylogenetic placement seeks to find the optimal position for a new query species on an existing backbone tree. Fast and accurate distance-based phylogenetic placement methods lack the crucial feature of estimating the support values for various placements of a query sequence. This study presents both parametric and nonparametric methods for measuring the support values of distance-based phylogenetic placements. ABSTRACT: Phylogenetic identification of unknown sequences by placing them on a tree is routinely attempted in modern ecological studies. Such placements are often obtained from incomplete and noisy data, making it essential to augment the results with some notion of uncertainty. While the standard likelihood-based methods designed for placement naturally provide such measures of uncertainty, the newer and more scalable distance-based methods lack this crucial feature. Here, we adopt several parametric and nonparametric sampling methods for measuring the support of phylogenetic placements that have been obtained with the use of distances. Comparing the alternative strategies, we conclude that nonparametric bootstrapping is more accurate than the alternatives. We go on to show how bootstrapping can be performed efficiently using a linear algebraic formulation that makes it up to 30 times faster and implement this optimized version as part of the distance-based placement software APPLES. By examining a wide range of applications, we show that the relative accuracy of maximum likelihood (ML) support values as compared to distance-based methods depends on the application and the dataset. ML is advantageous for fragmentary queries, while distance-based support values are more accurate for full-length and multi-gene datasets. With the quantification of uncertainty, our work fills a crucial gap that prevents the broader adoption of distance-based placement tools. MDPI 2022-08-12 /pmc/articles/PMC9404983/ /pubmed/36009839 http://dx.doi.org/10.3390/biology11081212 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hasan, Navid Bin
Balaban, Metin
Biswas, Avijit
Bayzid, Md. Shamsuzzoha
Mirarab, Siavash
Distance-Based Phylogenetic Placement with Statistical Support
title Distance-Based Phylogenetic Placement with Statistical Support
title_full Distance-Based Phylogenetic Placement with Statistical Support
title_fullStr Distance-Based Phylogenetic Placement with Statistical Support
title_full_unstemmed Distance-Based Phylogenetic Placement with Statistical Support
title_short Distance-Based Phylogenetic Placement with Statistical Support
title_sort distance-based phylogenetic placement with statistical support
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404983/
https://www.ncbi.nlm.nih.gov/pubmed/36009839
http://dx.doi.org/10.3390/biology11081212
work_keys_str_mv AT hasannavidbin distancebasedphylogeneticplacementwithstatisticalsupport
AT balabanmetin distancebasedphylogeneticplacementwithstatisticalsupport
AT biswasavijit distancebasedphylogeneticplacementwithstatisticalsupport
AT bayzidmdshamsuzzoha distancebasedphylogeneticplacementwithstatisticalsupport
AT mirarabsiavash distancebasedphylogeneticplacementwithstatisticalsupport