Cargando…

FASTRAL: improving scalability of phylogenomic analysis

MOTIVATION: ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e. hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (M...

Descripción completa

Detalles Bibliográficos
Autores principales: Dibaeinia, Payam, Tabe-Bordbar, Shayan, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388037/
https://www.ncbi.nlm.nih.gov/pubmed/33576396
http://dx.doi.org/10.1093/bioinformatics/btab093
_version_ 1783742563047440384
author Dibaeinia, Payam
Tabe-Bordbar, Shayan
Warnow, Tandy
author_facet Dibaeinia, Payam
Tabe-Bordbar, Shayan
Warnow, Tandy
author_sort Dibaeinia, Payam
collection PubMed
description MOTIVATION: ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e. hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (MSC), runs in polynomial time, and is able to run on large datasets. Key to ASTRAL’s algorithm is the use of dynamic programming to find an optimal solution to the MQSST (maximum quartet support supertree) within a constraint space that it computes from the input. Yet, ASTRAL can fail to complete within reasonable timeframes on large datasets with many genes and species, because in these cases the constraint space it computes is too large. RESULTS: Here, we introduce FASTRAL, a phylogenomic estimation method. FASTRAL is based on ASTRAL, but uses a different technique for constructing the constraint space. The technique we use to define the constraint space maintains statistical consistency and is polynomial time; thus we prove that FASTRAL is a polynomial time algorithm that is statistically consistent under the MSC. Our performance study on both biological and simulated datasets demonstrates that FASTRAL matches or improves on ASTRAL with respect to species tree topology accuracy (and under high ILS conditions it is statistically significantly more accurate), while being dramatically faster—especially on datasets with large numbers of genes and high ILS—due to using a significantly smaller constraint space. AVAILABILITYAND IMPLEMENTATION: FASTRAL is available in open-source form at https://github.com/PayamDiba/FASTRAL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8388037
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83880372021-08-26 FASTRAL: improving scalability of phylogenomic analysis Dibaeinia, Payam Tabe-Bordbar, Shayan Warnow, Tandy Bioinformatics Original Papers MOTIVATION: ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e. hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (MSC), runs in polynomial time, and is able to run on large datasets. Key to ASTRAL’s algorithm is the use of dynamic programming to find an optimal solution to the MQSST (maximum quartet support supertree) within a constraint space that it computes from the input. Yet, ASTRAL can fail to complete within reasonable timeframes on large datasets with many genes and species, because in these cases the constraint space it computes is too large. RESULTS: Here, we introduce FASTRAL, a phylogenomic estimation method. FASTRAL is based on ASTRAL, but uses a different technique for constructing the constraint space. The technique we use to define the constraint space maintains statistical consistency and is polynomial time; thus we prove that FASTRAL is a polynomial time algorithm that is statistically consistent under the MSC. Our performance study on both biological and simulated datasets demonstrates that FASTRAL matches or improves on ASTRAL with respect to species tree topology accuracy (and under high ILS conditions it is statistically significantly more accurate), while being dramatically faster—especially on datasets with large numbers of genes and high ILS—due to using a significantly smaller constraint space. AVAILABILITYAND IMPLEMENTATION: FASTRAL is available in open-source form at https://github.com/PayamDiba/FASTRAL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-02-08 /pmc/articles/PMC8388037/ /pubmed/33576396 http://dx.doi.org/10.1093/bioinformatics/btab093 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Dibaeinia, Payam
Tabe-Bordbar, Shayan
Warnow, Tandy
FASTRAL: improving scalability of phylogenomic analysis
title FASTRAL: improving scalability of phylogenomic analysis
title_full FASTRAL: improving scalability of phylogenomic analysis
title_fullStr FASTRAL: improving scalability of phylogenomic analysis
title_full_unstemmed FASTRAL: improving scalability of phylogenomic analysis
title_short FASTRAL: improving scalability of phylogenomic analysis
title_sort fastral: improving scalability of phylogenomic analysis
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388037/
https://www.ncbi.nlm.nih.gov/pubmed/33576396
http://dx.doi.org/10.1093/bioinformatics/btab093
work_keys_str_mv AT dibaeiniapayam fastralimprovingscalabilityofphylogenomicanalysis
AT tabebordbarshayan fastralimprovingscalabilityofphylogenomicanalysis
AT warnowtandy fastralimprovingscalabilityofphylogenomicanalysis