Cargando…

STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency

BACKGROUND: Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage...

Descripción completa

Detalles Bibliográficos
Autores principales: Islam, Mazharul, Sarker, Kowshika, Das, Trisha, Reaz, Rezwana, Bayzid, Md. Shamsuzzoha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7011378/
https://www.ncbi.nlm.nih.gov/pubmed/32039704
http://dx.doi.org/10.1186/s12864-020-6519-y
_version_ 1783496057537167360
author Islam, Mazharul
Sarker, Kowshika
Das, Trisha
Reaz, Rezwana
Bayzid, Md. Shamsuzzoha
author_facet Islam, Mazharul
Sarker, Kowshika
Das, Trisha
Reaz, Rezwana
Bayzid, Md. Shamsuzzoha
author_sort Islam, Mazharul
collection PubMed
description BACKGROUND: Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets. RESULTS: We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL – two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets. CONCLUSIONS: Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions.
format Online
Article
Text
id pubmed-7011378
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70113782020-02-14 STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency Islam, Mazharul Sarker, Kowshika Das, Trisha Reaz, Rezwana Bayzid, Md. Shamsuzzoha BMC Genomics Software BACKGROUND: Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets. RESULTS: We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL – two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets. CONCLUSIONS: Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions. BioMed Central 2020-02-10 /pmc/articles/PMC7011378/ /pubmed/32039704 http://dx.doi.org/10.1186/s12864-020-6519-y Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Islam, Mazharul
Sarker, Kowshika
Das, Trisha
Reaz, Rezwana
Bayzid, Md. Shamsuzzoha
STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
title STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
title_full STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
title_fullStr STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
title_full_unstemmed STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
title_short STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
title_sort stelar: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7011378/
https://www.ncbi.nlm.nih.gov/pubmed/32039704
http://dx.doi.org/10.1186/s12864-020-6519-y
work_keys_str_mv AT islammazharul stelarastatisticallyconsistentcoalescentbasedspeciestreeestimationmethodbymaximizingtripletconsistency
AT sarkerkowshika stelarastatisticallyconsistentcoalescentbasedspeciestreeestimationmethodbymaximizingtripletconsistency
AT dastrisha stelarastatisticallyconsistentcoalescentbasedspeciestreeestimationmethodbymaximizingtripletconsistency
AT reazrezwana stelarastatisticallyconsistentcoalescentbasedspeciestreeestimationmethodbymaximizingtripletconsistency
AT bayzidmdshamsuzzoha stelarastatisticallyconsistentcoalescentbasedspeciestreeestimationmethodbymaximizingtripletconsistency