Cargando…

Robust Analysis of Phylogenetic Tree Space

Phylogenetic analyses often produce large numbers of trees. Mapping trees’ distribution in “tree space” can illuminate the behavior and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods—but the h...

Descripción completa

Detalles Bibliográficos
Autor principal: Smith, Martin R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9366458/
https://www.ncbi.nlm.nih.gov/pubmed/34963003
http://dx.doi.org/10.1093/sysbio/syab100
_version_ 1784765569678966784
author Smith, Martin R
author_facet Smith, Martin R
author_sort Smith, Martin R
collection PubMed
description Phylogenetic analyses often produce large numbers of trees. Mapping trees’ distribution in “tree space” can illuminate the behavior and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods—but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological data sets, using stratigraphic congruence—a complementary aspect of tree similarity—to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson–Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall–Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the “TreeDist” R package. [Multidimensional scaling; phylogenetic software; tree distance metrics; treespace projections.]
format Online
Article
Text
id pubmed-9366458
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93664582022-08-11 Robust Analysis of Phylogenetic Tree Space Smith, Martin R Syst Biol Points of View Phylogenetic analyses often produce large numbers of trees. Mapping trees’ distribution in “tree space” can illuminate the behavior and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods—but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological data sets, using stratigraphic congruence—a complementary aspect of tree similarity—to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson–Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall–Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the “TreeDist” R package. [Multidimensional scaling; phylogenetic software; tree distance metrics; treespace projections.] Oxford University Press 2021-12-28 /pmc/articles/PMC9366458/ /pubmed/34963003 http://dx.doi.org/10.1093/sysbio/syab100 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Points of View
Smith, Martin R
Robust Analysis of Phylogenetic Tree Space
title Robust Analysis of Phylogenetic Tree Space
title_full Robust Analysis of Phylogenetic Tree Space
title_fullStr Robust Analysis of Phylogenetic Tree Space
title_full_unstemmed Robust Analysis of Phylogenetic Tree Space
title_short Robust Analysis of Phylogenetic Tree Space
title_sort robust analysis of phylogenetic tree space
topic Points of View
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9366458/
https://www.ncbi.nlm.nih.gov/pubmed/34963003
http://dx.doi.org/10.1093/sysbio/syab100
work_keys_str_mv AT smithmartinr robustanalysisofphylogenetictreespace