Cargando…

Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees

Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Chao, Mirarab, Siavash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750496/
https://www.ncbi.nlm.nih.gov/pubmed/36201617
http://dx.doi.org/10.1093/molbev/msac215
_version_ 1784850268640247808
author Zhang, Chao
Mirarab, Siavash
author_facet Zhang, Chao
Mirarab, Siavash
author_sort Zhang, Chao
collection PubMed
description Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
format Online
Article
Text
id pubmed-9750496
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97504962022-12-15 Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees Zhang, Chao Mirarab, Siavash Mol Biol Evol Methods Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines. Oxford University Press 2022-10-06 /pmc/articles/PMC9750496/ /pubmed/36201617 http://dx.doi.org/10.1093/molbev/msac215 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Zhang, Chao
Mirarab, Siavash
Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
title Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
title_full Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
title_fullStr Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
title_full_unstemmed Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
title_short Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
title_sort weighting by gene tree uncertainty improves accuracy of quartet-based species trees
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750496/
https://www.ncbi.nlm.nih.gov/pubmed/36201617
http://dx.doi.org/10.1093/molbev/msac215
work_keys_str_mv AT zhangchao weightingbygenetreeuncertaintyimprovesaccuracyofquartetbasedspeciestrees
AT mirarabsiavash weightingbygenetreeuncertaintyimprovesaccuracyofquartetbasedspeciestrees