Cargando…
Weighted ASTRID: fast and accurate species trees from weighted internode distances
BACKGROUND: Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), wh...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355063/ https://www.ncbi.nlm.nih.gov/pubmed/37468904 http://dx.doi.org/10.1186/s13015-023-00230-6 |
_version_ | 1785075060888829952 |
---|---|
author | Liu, Baqiao Warnow, Tandy |
author_facet | Liu, Baqiao Warnow, Tandy |
author_sort | Liu, Baqiao |
collection | PubMed |
description | BACKGROUND: Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing “gene trees”) and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. RESULTS: Our experimental study evaluating weighted ASTRID typically shows improvements in accuracy compared to the original (unweighted) ASTRID, and shows competitive accuracy against weighted ASTRAL, the state of the art. Our re-implementation of ASTRID also improves the runtime, with marked improvements on large datasets. CONCLUSIONS: Weighted ASTRID is a new and very fast method for species tree estimation that typically improves upon ASTRID and has comparable accuracy to weighted ASTRAL, while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode. |
format | Online Article Text |
id | pubmed-10355063 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-103550632023-07-20 Weighted ASTRID: fast and accurate species trees from weighted internode distances Liu, Baqiao Warnow, Tandy Algorithms Mol Biol Research BACKGROUND: Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing “gene trees”) and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. RESULTS: Our experimental study evaluating weighted ASTRID typically shows improvements in accuracy compared to the original (unweighted) ASTRID, and shows competitive accuracy against weighted ASTRAL, the state of the art. Our re-implementation of ASTRID also improves the runtime, with marked improvements on large datasets. CONCLUSIONS: Weighted ASTRID is a new and very fast method for species tree estimation that typically improves upon ASTRID and has comparable accuracy to weighted ASTRAL, while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode. BioMed Central 2023-07-19 /pmc/articles/PMC10355063/ /pubmed/37468904 http://dx.doi.org/10.1186/s13015-023-00230-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liu, Baqiao Warnow, Tandy Weighted ASTRID: fast and accurate species trees from weighted internode distances |
title | Weighted ASTRID: fast and accurate species trees from weighted internode distances |
title_full | Weighted ASTRID: fast and accurate species trees from weighted internode distances |
title_fullStr | Weighted ASTRID: fast and accurate species trees from weighted internode distances |
title_full_unstemmed | Weighted ASTRID: fast and accurate species trees from weighted internode distances |
title_short | Weighted ASTRID: fast and accurate species trees from weighted internode distances |
title_sort | weighted astrid: fast and accurate species trees from weighted internode distances |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355063/ https://www.ncbi.nlm.nih.gov/pubmed/37468904 http://dx.doi.org/10.1186/s13015-023-00230-6 |
work_keys_str_mv | AT liubaqiao weightedastridfastandaccuratespeciestreesfromweightedinternodedistances AT warnowtandy weightedastridfastandaccuratespeciestreesfromweightedinternodedistances |