Cargando…

The performance of coalescent-based species tree estimation methods under models of missing data

BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees...

Descripción completa

Detalles Bibliográficos
Autores principales: Nute, Michael, Chou, Jed, Molloy, Erin K., Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998899/
https://www.ncbi.nlm.nih.gov/pubmed/29745854
http://dx.doi.org/10.1186/s12864-018-4619-8
_version_ 1783331326135369728
author Nute, Michael
Chou, Jed
Molloy, Erin K.
Warnow, Tandy
author_facet Nute, Michael
Chou, Jed
Molloy, Erin K.
Warnow, Tandy
author_sort Nute, Michael
collection PubMed
description BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species. RESULTS: We establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data. CONCLUSIONS: All the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4619-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5998899
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59988992018-06-25 The performance of coalescent-based species tree estimation methods under models of missing data Nute, Michael Chou, Jed Molloy, Erin K. Warnow, Tandy BMC Genomics Research BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species. RESULTS: We establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data. CONCLUSIONS: All the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4619-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-08 /pmc/articles/PMC5998899/ /pubmed/29745854 http://dx.doi.org/10.1186/s12864-018-4619-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Nute, Michael
Chou, Jed
Molloy, Erin K.
Warnow, Tandy
The performance of coalescent-based species tree estimation methods under models of missing data
title The performance of coalescent-based species tree estimation methods under models of missing data
title_full The performance of coalescent-based species tree estimation methods under models of missing data
title_fullStr The performance of coalescent-based species tree estimation methods under models of missing data
title_full_unstemmed The performance of coalescent-based species tree estimation methods under models of missing data
title_short The performance of coalescent-based species tree estimation methods under models of missing data
title_sort performance of coalescent-based species tree estimation methods under models of missing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998899/
https://www.ncbi.nlm.nih.gov/pubmed/29745854
http://dx.doi.org/10.1186/s12864-018-4619-8
work_keys_str_mv AT nutemichael theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT choujed theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT molloyerink theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT warnowtandy theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT nutemichael performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT choujed performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT molloyerink performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata
AT warnowtandy performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata