Cargando…
Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome
Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literatu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5234780/ https://www.ncbi.nlm.nih.gov/pubmed/28085878 http://dx.doi.org/10.1371/journal.pone.0168131 |
_version_ | 1782495049188114432 |
---|---|
author | Satten, Glen A. Tyx, Robert E. Rivera, Angel J. Stanfill, Stephen |
author_facet | Satten, Glen A. Tyx, Robert E. Rivera, Angel J. Stanfill, Stephen |
author_sort | Satten, Glen A. |
collection | PubMed |
description | Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literature. After sequences are binned into operational taxonomic units (OTUs) or species, data from these studies are summarized in a data matrix with the observed counts from each OTU for each sample. Analysis often reduces these data further to a matrix of pairwise distances or dissimilarities; plotting the first two or three principal components (PCs) of this distance matrix often reveals meaningful groupings in the data. However, once the distance matrix is calculated, it is no longer clear which OTUs or species are important to the observed clustering; further, the PCs are hard to interpret and cannot be calculated for subsequent observations. We show how to construct approximate decompositions of the data matrix that pair PCs with linear combinations of OTU or species frequencies, and show how these decompositions can be used to construct biplots, select important OTUs and partition the variability in the data matrix into contributions corresponding to PCs of an arbitrary distance or dissimilarity matrix. To illustrate our approach, we conduct an analysis of the bacteria found in 45 smokeless tobacco samples. |
format | Online Article Text |
id | pubmed-5234780 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-52347802017-02-06 Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome Satten, Glen A. Tyx, Robert E. Rivera, Angel J. Stanfill, Stephen PLoS One Research Article Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literature. After sequences are binned into operational taxonomic units (OTUs) or species, data from these studies are summarized in a data matrix with the observed counts from each OTU for each sample. Analysis often reduces these data further to a matrix of pairwise distances or dissimilarities; plotting the first two or three principal components (PCs) of this distance matrix often reveals meaningful groupings in the data. However, once the distance matrix is calculated, it is no longer clear which OTUs or species are important to the observed clustering; further, the PCs are hard to interpret and cannot be calculated for subsequent observations. We show how to construct approximate decompositions of the data matrix that pair PCs with linear combinations of OTU or species frequencies, and show how these decompositions can be used to construct biplots, select important OTUs and partition the variability in the data matrix into contributions corresponding to PCs of an arbitrary distance or dissimilarity matrix. To illustrate our approach, we conduct an analysis of the bacteria found in 45 smokeless tobacco samples. Public Library of Science 2017-01-13 /pmc/articles/PMC5234780/ /pubmed/28085878 http://dx.doi.org/10.1371/journal.pone.0168131 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Satten, Glen A. Tyx, Robert E. Rivera, Angel J. Stanfill, Stephen Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome |
title | Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome |
title_full | Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome |
title_fullStr | Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome |
title_full_unstemmed | Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome |
title_short | Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome |
title_sort | restoring the duality between principal components of a distance matrix and linear combinations of predictors, with application to studies of the microbiome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5234780/ https://www.ncbi.nlm.nih.gov/pubmed/28085878 http://dx.doi.org/10.1371/journal.pone.0168131 |
work_keys_str_mv | AT sattenglena restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome AT tyxroberte restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome AT riveraangelj restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome AT stanfillstephen restoringthedualitybetweenprincipalcomponentsofadistancematrixandlinearcombinationsofpredictorswithapplicationtostudiesofthemicrobiome |