Cargando…
The performance of coalescent-based species tree estimation methods under models of missing data
BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998899/ https://www.ncbi.nlm.nih.gov/pubmed/29745854 http://dx.doi.org/10.1186/s12864-018-4619-8 |
_version_ | 1783331326135369728 |
---|---|
author | Nute, Michael Chou, Jed Molloy, Erin K. Warnow, Tandy |
author_facet | Nute, Michael Chou, Jed Molloy, Erin K. Warnow, Tandy |
author_sort | Nute, Michael |
collection | PubMed |
description | BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species. RESULTS: We establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data. CONCLUSIONS: All the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4619-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5998899 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-59988992018-06-25 The performance of coalescent-based species tree estimation methods under models of missing data Nute, Michael Chou, Jed Molloy, Erin K. Warnow, Tandy BMC Genomics Research BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species. RESULTS: We establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data. CONCLUSIONS: All the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4619-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-08 /pmc/articles/PMC5998899/ /pubmed/29745854 http://dx.doi.org/10.1186/s12864-018-4619-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Nute, Michael Chou, Jed Molloy, Erin K. Warnow, Tandy The performance of coalescent-based species tree estimation methods under models of missing data |
title | The performance of coalescent-based species tree estimation methods under models of missing data |
title_full | The performance of coalescent-based species tree estimation methods under models of missing data |
title_fullStr | The performance of coalescent-based species tree estimation methods under models of missing data |
title_full_unstemmed | The performance of coalescent-based species tree estimation methods under models of missing data |
title_short | The performance of coalescent-based species tree estimation methods under models of missing data |
title_sort | performance of coalescent-based species tree estimation methods under models of missing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998899/ https://www.ncbi.nlm.nih.gov/pubmed/29745854 http://dx.doi.org/10.1186/s12864-018-4619-8 |
work_keys_str_mv | AT nutemichael theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT choujed theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT molloyerink theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT warnowtandy theperformanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT nutemichael performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT choujed performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT molloyerink performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata AT warnowtandy performanceofcoalescentbasedspeciestreeestimationmethodsundermodelsofmissingdata |