Cargando…

QuCo: quartet-based co-estimation of species trees and gene trees

MOTIVATION: Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Rabiee, Maryam, Mirarab, Siavash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235488/
https://www.ncbi.nlm.nih.gov/pubmed/35758818
http://dx.doi.org/10.1093/bioinformatics/btac265
_version_ 1784736321304002560
author Rabiee, Maryam
Mirarab, Siavash
author_facet Rabiee, Maryam
Mirarab, Siavash
author_sort Rabiee, Maryam
collection PubMed
description MOTIVATION: Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. RESULTS: We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. AVAILABILITY AND IMPLEMENTATION: QuCo is available on https://github.com/maryamrabiee/quco. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9235488
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92354882022-06-29 QuCo: quartet-based co-estimation of species trees and gene trees Rabiee, Maryam Mirarab, Siavash Bioinformatics ISCB/Ismb 2022 MOTIVATION: Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. RESULTS: We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. AVAILABILITY AND IMPLEMENTATION: QuCo is available on https://github.com/maryamrabiee/quco. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-27 /pmc/articles/PMC9235488/ /pubmed/35758818 http://dx.doi.org/10.1093/bioinformatics/btac265 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle ISCB/Ismb 2022
Rabiee, Maryam
Mirarab, Siavash
QuCo: quartet-based co-estimation of species trees and gene trees
title QuCo: quartet-based co-estimation of species trees and gene trees
title_full QuCo: quartet-based co-estimation of species trees and gene trees
title_fullStr QuCo: quartet-based co-estimation of species trees and gene trees
title_full_unstemmed QuCo: quartet-based co-estimation of species trees and gene trees
title_short QuCo: quartet-based co-estimation of species trees and gene trees
title_sort quco: quartet-based co-estimation of species trees and gene trees
topic ISCB/Ismb 2022
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235488/
https://www.ncbi.nlm.nih.gov/pubmed/35758818
http://dx.doi.org/10.1093/bioinformatics/btac265
work_keys_str_mv AT rabieemaryam qucoquartetbasedcoestimationofspeciestreesandgenetrees
AT mirarabsiavash qucoquartetbasedcoestimationofspeciestreesandgenetrees