Cargando…

Weighted ASTRID: fast and accurate species trees from weighted internode distances

BACKGROUND: Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Baqiao, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355063/
https://www.ncbi.nlm.nih.gov/pubmed/37468904
http://dx.doi.org/10.1186/s13015-023-00230-6
_version_ 1785075060888829952
author Liu, Baqiao
Warnow, Tandy
author_facet Liu, Baqiao
Warnow, Tandy
author_sort Liu, Baqiao
collection PubMed
description BACKGROUND: Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing “gene trees”) and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. RESULTS: Our experimental study evaluating weighted ASTRID typically shows improvements in accuracy compared to the original (unweighted) ASTRID, and shows competitive accuracy against weighted ASTRAL, the state of the art. Our re-implementation of ASTRID also improves the runtime, with marked improvements on large datasets. CONCLUSIONS: Weighted ASTRID is a new and very fast method for species tree estimation that typically improves upon ASTRID and has comparable accuracy to weighted ASTRAL, while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode.
format Online
Article
Text
id pubmed-10355063
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-103550632023-07-20 Weighted ASTRID: fast and accurate species trees from weighted internode distances Liu, Baqiao Warnow, Tandy Algorithms Mol Biol Research BACKGROUND: Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing “gene trees”) and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. RESULTS: Our experimental study evaluating weighted ASTRID typically shows improvements in accuracy compared to the original (unweighted) ASTRID, and shows competitive accuracy against weighted ASTRAL, the state of the art. Our re-implementation of ASTRID also improves the runtime, with marked improvements on large datasets. CONCLUSIONS: Weighted ASTRID is a new and very fast method for species tree estimation that typically improves upon ASTRID and has comparable accuracy to weighted ASTRAL, while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode. BioMed Central 2023-07-19 /pmc/articles/PMC10355063/ /pubmed/37468904 http://dx.doi.org/10.1186/s13015-023-00230-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Baqiao
Warnow, Tandy
Weighted ASTRID: fast and accurate species trees from weighted internode distances
title Weighted ASTRID: fast and accurate species trees from weighted internode distances
title_full Weighted ASTRID: fast and accurate species trees from weighted internode distances
title_fullStr Weighted ASTRID: fast and accurate species trees from weighted internode distances
title_full_unstemmed Weighted ASTRID: fast and accurate species trees from weighted internode distances
title_short Weighted ASTRID: fast and accurate species trees from weighted internode distances
title_sort weighted astrid: fast and accurate species trees from weighted internode distances
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355063/
https://www.ncbi.nlm.nih.gov/pubmed/37468904
http://dx.doi.org/10.1186/s13015-023-00230-6
work_keys_str_mv AT liubaqiao weightedastridfastandaccuratespeciestreesfromweightedinternodedistances
AT warnowtandy weightedastridfastandaccuratespeciestreesfromweightedinternodedistances