Cargando…

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees....

Descripción completa

Detalles Bibliográficos
Autores principales: Nye, Tom M W, Tang, Xiaoxian, Weyenberg, Grady, Yoshida, Ruriko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793493/
https://www.ncbi.nlm.nih.gov/pubmed/29422694
http://dx.doi.org/10.1093/biomet/asx047
_version_ 1783296965644124160
author Nye, Tom M W
Tang, Xiaoxian
Weyenberg, Grady
Yoshida, Ruriko
author_facet Nye, Tom M W
Tang, Xiaoxian
Weyenberg, Grady
Yoshida, Ruriko
author_sort Nye, Tom M W
collection PubMed
description Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text] th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text] , and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.
format Online
Article
Text
id pubmed-5793493
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57934932018-02-06 Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees Nye, Tom M W Tang, Xiaoxian Weyenberg, Grady Yoshida, Ruriko Biometrika Articles Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text] th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text] , and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components. Oxford University Press 2017-12 2017-09-27 /pmc/articles/PMC5793493/ /pubmed/29422694 http://dx.doi.org/10.1093/biomet/asx047 Text en © 2017 Biometrika Trust https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Articles
Nye, Tom M W
Tang, Xiaoxian
Weyenberg, Grady
Yoshida, Ruriko
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
title Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
title_full Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
title_fullStr Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
title_full_unstemmed Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
title_short Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees
title_sort principal component analysis and the locus of the fréchet mean in the space of phylogenetic trees
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793493/
https://www.ncbi.nlm.nih.gov/pubmed/29422694
http://dx.doi.org/10.1093/biomet/asx047
work_keys_str_mv AT nyetommw principalcomponentanalysisandthelocusofthefrechetmeaninthespaceofphylogenetictrees
AT tangxiaoxian principalcomponentanalysisandthelocusofthefrechetmeaninthespaceofphylogenetictrees
AT weyenberggrady principalcomponentanalysisandthelocusofthefrechetmeaninthespaceofphylogenetictrees
AT yoshidaruriko principalcomponentanalysisandthelocusofthefrechetmeaninthespaceofphylogenetictrees