Cargando…

Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome

Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literatu...

Descripción completa

Detalles Bibliográficos
Autores principales: Satten, Glen A., Tyx, Robert E., Rivera, Angel J., Stanfill, Stephen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5234780/
https://www.ncbi.nlm.nih.gov/pubmed/28085878
http://dx.doi.org/10.1371/journal.pone.0168131
_version_ 1782495049188114432
author Satten, Glen A.
Tyx, Robert E.
Rivera, Angel J.
Stanfill, Stephen
author_facet Satten, Glen A.
Tyx, Robert E.
Rivera, Angel J.
Stanfill, Stephen
author_sort Satten, Glen A.
collection PubMed
description Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literature. After sequences are binned into operational taxonomic units (OTUs) or species, data from these studies are summarized in a data matrix with the observed counts from each OTU for each sample. Analysis often reduces these data further to a matrix of pairwise distances or dissimilarities; plotting the first two or three principal components (PCs) of this distance matrix often reveals meaningful groupings in the data. However, once the distance matrix is calculated, it is no longer clear which OTUs or species are important to the observed clustering; further, the PCs are hard to interpret and cannot be calculated for subsequent observations. We show how to construct approximate decompositions of the data matrix that pair PCs with linear combinations of OTU or species frequencies, and show how these decompositions can be used to construct biplots, select important OTUs and partition the variability in the data matrix into contributions corresponding to PCs of an arbitrary distance or dissimilarity matrix. To illustrate our approach, we conduct an analysis of the bacteria found in 45 smokeless tobacco samples.
format Online
Article
Text
id pubmed-5234780
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-52347802017-02-06 Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome Satten, Glen A. Tyx, Robert E. Rivera, Angel J. Stanfill, Stephen PLoS One Research Article Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literature. After sequences are binned into operational taxonomic units (OTUs) or species, data from these studies are summarized in a data matrix with the observed counts from each OTU for each sample. Analysis often reduces these data further to a matrix of pairwise distances or dissimilarities; plotting the first two or three principal components (PCs) of this distance matrix often reveals meaningful groupings in the data. However, once the distance matrix is calculated, it is no longer clear which OTUs or species are important to the observed clustering; further, the PCs are hard to interpret and cannot be calculated for subsequent observations. We show how to construct approximate decompositions of the data matrix that pair PCs with linear combinations of OTU or species frequencies, and show how these decompositions can be used to construct biplots, select important OTUs and partition the variability in the data matrix into contributions corresponding to PCs of an arbitrary distance or dissimilarity matrix. To illustrate our approach, we conduct an analysis of the bacteria found in 45 smokeless tobacco samples. Public Library of Science 2017-01-13 /pmc/articles/PMC5234780/ /pubmed/28085878 http://dx.doi.org/10.1371/journal.pone.0168131 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Satten, Glen A.
Tyx, Robert E.
Rivera, Angel J.
Stanfill, Stephen
Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
title Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
title_full Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
title_fullStr Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
title_full_unstemmed Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
title_short Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
title_sort restoring the duality between principal components of a distance matrix and linear combinations of predictors, with application to studies of the microbiome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5234780/
https://www.ncbi.nlm.nih.gov/pubmed/28085878
http://dx.doi.org/10.1371/journal.pone.0168131
work_keys_str_mv AT sattenglena restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome
AT tyxroberte restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome
AT riveraangelj restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome
AT stanfillstephen restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome