Cargando…

The prevalence of terraced treescapes in analyses of phylogenetic data sets

BACKGROUND: The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terrace...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dobrin, Barbara H., Zwickl, Derrick J., Sanderson, Michael J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5885316/ https://www.ncbi.nlm.nih.gov/pubmed/29618314 http://dx.doi.org/10.1186/s12862-018-1162-9

_version_	1783311959976837120
author	Dobrin, Barbara H. Zwickl, Derrick J. Sanderson, Michael J.
author_facet	Dobrin, Barbara H. Zwickl, Derrick J. Sanderson, Michael J.
author_sort	Dobrin, Barbara H.
collection	PubMed
description	BACKGROUND: The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling “sufficiency”. We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. RESULTS: Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. CONCLUSIONS: If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.
format	Online Article Text
id	pubmed-5885316
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-58853162018-04-09 The prevalence of terraced treescapes in analyses of phylogenetic data sets Dobrin, Barbara H. Zwickl, Derrick J. Sanderson, Michael J. BMC Evol Biol Research Article BACKGROUND: The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling “sufficiency”. We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. RESULTS: Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. CONCLUSIONS: If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction. BioMed Central 2018-04-04 /pmc/articles/PMC5885316/ /pubmed/29618314 http://dx.doi.org/10.1186/s12862-018-1162-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Dobrin, Barbara H. Zwickl, Derrick J. Sanderson, Michael J. The prevalence of terraced treescapes in analyses of phylogenetic data sets
title	The prevalence of terraced treescapes in analyses of phylogenetic data sets
title_full	The prevalence of terraced treescapes in analyses of phylogenetic data sets
title_fullStr	The prevalence of terraced treescapes in analyses of phylogenetic data sets
title_full_unstemmed	The prevalence of terraced treescapes in analyses of phylogenetic data sets
title_short	The prevalence of terraced treescapes in analyses of phylogenetic data sets
title_sort	prevalence of terraced treescapes in analyses of phylogenetic data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5885316/ https://www.ncbi.nlm.nih.gov/pubmed/29618314 http://dx.doi.org/10.1186/s12862-018-1162-9
work_keys_str_mv	AT dobrinbarbarah theprevalenceofterracedtreescapesinanalysesofphylogeneticdatasets AT zwicklderrickj theprevalenceofterracedtreescapesinanalysesofphylogeneticdatasets AT sandersonmichaelj theprevalenceofterracedtreescapesinanalysesofphylogeneticdatasets AT dobrinbarbarah prevalenceofterracedtreescapesinanalysesofphylogeneticdatasets AT zwicklderrickj prevalenceofterracedtreescapesinanalysesofphylogeneticdatasets AT sandersonmichaelj prevalenceofterracedtreescapesinanalysesofphylogeneticdatasets

The prevalence of terraced treescapes in analyses of phylogenetic data sets

Ejemplares similares