Cargando…

SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement

SUMMARY: Phylogenetic placement is the problem of placing ‘query’ sequences into an existing tree (called a ‘backbone tree’). One of the most accurate phylogenetic placement methods to date is the maximum likelihood-based method pplacer, using RAxML to estimate numeric parameters on the backbone tre...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Gillian, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933845/
https://www.ncbi.nlm.nih.gov/pubmed/36818728
http://dx.doi.org/10.1093/bioadv/vbad008
_version_ 1784889758229463040
author Chu, Gillian
Warnow, Tandy
author_facet Chu, Gillian
Warnow, Tandy
author_sort Chu, Gillian
collection PubMed
description SUMMARY: Phylogenetic placement is the problem of placing ‘query’ sequences into an existing tree (called a ‘backbone tree’). One of the most accurate phylogenetic placement methods to date is the maximum likelihood-based method pplacer, using RAxML to estimate numeric parameters on the backbone tree and then adding the given query sequence to the edge that maximizes the probability that the resulting tree generates the query sequence. Unfortunately, this way of running pplacer fails to return valid outputs on many moderately large backbone trees and so is limited to backbone trees with at most ∼10 000 leaves. SCAMPP is a technique to enable pplacer to run on larger backbone trees, which operates by finding a small ‘placement subtree’ specific to each query sequence, within which the query sequence are placed using pplacer. That approach matched the scalability and accuracy of APPLES-2, the previous most scalable method. Here, we explore a different aspect of pplacer’s strategy: the technique used to estimate numeric parameters on the backbone tree. We confirm anecdotal evidence that using FastTree instead of RAxML to estimate numeric parameters on the backbone tree enables pplacer to scale to much larger backbone trees, almost (but not quite) matching the scalability of APPLES-2 and pplacer-SCAMPP. We then evaluate the combination of these two techniques—SCAMPP and the use of FastTree. We show that this combined approach, pplacer-SCAMPP-FastTree, has the same scalability as APPLES-2, improves on the scalability of pplacer-FastTree and achieves better accuracy than the comparably scalable methods. AVAILABILITY AND IMPLEMENTATION: https://github.com/gillichu/PLUSplacer-taxtastic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9933845
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99338452023-02-17 SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement Chu, Gillian Warnow, Tandy Bioinform Adv Original Paper SUMMARY: Phylogenetic placement is the problem of placing ‘query’ sequences into an existing tree (called a ‘backbone tree’). One of the most accurate phylogenetic placement methods to date is the maximum likelihood-based method pplacer, using RAxML to estimate numeric parameters on the backbone tree and then adding the given query sequence to the edge that maximizes the probability that the resulting tree generates the query sequence. Unfortunately, this way of running pplacer fails to return valid outputs on many moderately large backbone trees and so is limited to backbone trees with at most ∼10 000 leaves. SCAMPP is a technique to enable pplacer to run on larger backbone trees, which operates by finding a small ‘placement subtree’ specific to each query sequence, within which the query sequence are placed using pplacer. That approach matched the scalability and accuracy of APPLES-2, the previous most scalable method. Here, we explore a different aspect of pplacer’s strategy: the technique used to estimate numeric parameters on the backbone tree. We confirm anecdotal evidence that using FastTree instead of RAxML to estimate numeric parameters on the backbone tree enables pplacer to scale to much larger backbone trees, almost (but not quite) matching the scalability of APPLES-2 and pplacer-SCAMPP. We then evaluate the combination of these two techniques—SCAMPP and the use of FastTree. We show that this combined approach, pplacer-SCAMPP-FastTree, has the same scalability as APPLES-2, improves on the scalability of pplacer-FastTree and achieves better accuracy than the comparably scalable methods. AVAILABILITY AND IMPLEMENTATION: https://github.com/gillichu/PLUSplacer-taxtastic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-01-30 /pmc/articles/PMC9933845/ /pubmed/36818728 http://dx.doi.org/10.1093/bioadv/vbad008 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Chu, Gillian
Warnow, Tandy
SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement
title SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement
title_full SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement
title_fullStr SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement
title_full_unstemmed SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement
title_short SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement
title_sort scampp+fasttree: improving scalability for likelihood-based phylogenetic placement
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933845/
https://www.ncbi.nlm.nih.gov/pubmed/36818728
http://dx.doi.org/10.1093/bioadv/vbad008
work_keys_str_mv AT chugillian scamppfasttreeimprovingscalabilityforlikelihoodbasedphylogeneticplacement
AT warnowtandy scamppfasttreeimprovingscalabilityforlikelihoodbasedphylogeneticplacement