Cargando…

Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in...

Descripción completa

Detalles Bibliográficos
Autores principales: Libiger, Ondrej, Schork, Nicholas J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681790/
https://www.ncbi.nlm.nih.gov/pubmed/26734061
http://dx.doi.org/10.3389/fgene.2015.00350
_version_ 1782405771673206784
author Libiger, Ondrej
Schork, Nicholas J.
author_facet Libiger, Ondrej
Schork, Nicholas J.
author_sort Libiger, Ondrej
collection PubMed
description It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples.
format Online
Article
Text
id pubmed-4681790
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-46817902016-01-05 Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples Libiger, Ondrej Schork, Nicholas J. Front Genet Genetics It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. Frontiers Media S.A. 2015-12-17 /pmc/articles/PMC4681790/ /pubmed/26734061 http://dx.doi.org/10.3389/fgene.2015.00350 Text en Copyright © 2015 Libiger and Schork. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Libiger, Ondrej
Schork, Nicholas J.
Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
title Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
title_full Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
title_fullStr Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
title_full_unstemmed Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
title_short Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
title_sort partial least squares regression can aid in detecting differential abundance of multiple features in sets of metagenomic samples
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681790/
https://www.ncbi.nlm.nih.gov/pubmed/26734061
http://dx.doi.org/10.3389/fgene.2015.00350
work_keys_str_mv AT libigerondrej partialleastsquaresregressioncanaidindetectingdifferentialabundanceofmultiplefeaturesinsetsofmetagenomicsamples
AT schorknicholasj partialleastsquaresregressioncanaidindetectingdifferentialabundanceofmultiplefeaturesinsetsofmetagenomicsamples