Cargando…

Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison

Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have dev...

Descripción completa

Detalles Bibliográficos
Autores principales:	Matsen IV, Frederick A., Evans, Steven N.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3594297/ https://www.ncbi.nlm.nih.gov/pubmed/23505415 http://dx.doi.org/10.1371/journal.pone.0056859

_version_	1782262319272689664
author	Matsen IV, Frederick A. Evans, Steven N.
author_facet	Matsen IV, Frederick A. Evans, Steven N.
author_sort	Matsen IV, Frederick A.
collection	PubMed
description	Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.
format	Online Article Text
id	pubmed-3594297
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-35942972013-03-15 Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison Matsen IV, Frederick A. Evans, Steven N. PLoS One Research Article Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. Public Library of Science 2013-03-11 /pmc/articles/PMC3594297/ /pubmed/23505415 http://dx.doi.org/10.1371/journal.pone.0056859 Text en © 2013 Matsen, Evans http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Matsen IV, Frederick A. Evans, Steven N. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
title	Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
title_full	Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
title_fullStr	Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
title_full_unstemmed	Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
title_short	Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison
title_sort	edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3594297/ https://www.ncbi.nlm.nih.gov/pubmed/23505415 http://dx.doi.org/10.1371/journal.pone.0056859
work_keys_str_mv	AT matsenivfredericka edgeprincipalcomponentsandsquashclusteringusingthespecialstructureofphylogeneticplacementdataforsamplecomparison AT evansstevenn edgeprincipalcomponentsandsquashclusteringusingthespecialstructureofphylogeneticplacementdataforsamplecomparison

Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison

Ejemplares similares