Cargando…
A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
Numerous metagenomic studies aim to discover associations between the microbial composition of an environment (e.g., gut, skin, oral) and a phenotype of interest. Multivariate analysis is often performed in these studies without critical a priori knowledge of which taxa are associated with the pheno...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8978828/ https://www.ncbi.nlm.nih.gov/pubmed/35387076 http://dx.doi.org/10.3389/fmicb.2022.837396 |
_version_ | 1784681039842508800 |
---|---|
author | Hinton, Andrew L. Mucha, Peter J. |
author_facet | Hinton, Andrew L. Mucha, Peter J. |
author_sort | Hinton, Andrew L. |
collection | PubMed |
description | Numerous metagenomic studies aim to discover associations between the microbial composition of an environment (e.g., gut, skin, oral) and a phenotype of interest. Multivariate analysis is often performed in these studies without critical a priori knowledge of which taxa are associated with the phenotype being studied. This approach typically reduces statistical power in settings where the true associations among only a few taxa are obscured by high dimensionality (i.e., sparse association signals). At the same time, low sample size and compositional sample space constraints may reduce beyond-study generalizability if not properly accounted for. To address these difficulties, we developed the Selection-Energy-Permutation (SelEnergyPerm) method, a nonparametric group association test with embedded feature selection that directly accounts for compositional constraints using parsimonious logratio signatures between taxonomic features, for characterizing and understanding alterations in microbial community structure. Simulation results show SelEnergyPerm selects small independent sets of logratios that capture strong associations in a range of scenarios. Additionally, our simulation results demonstrate SelEnergyPerm consistently detects/rejects associations in synthetic data with sparse, dense, or no association signals. We demonstrate the novel benefits of our method in four case studies utilizing publicly available 16S amplicon and whole-genome sequencing datasets. Our R implementation of Selection-Energy-Permutation, including an example demonstration and the code to generate all of the scenarios used here, is available at https://www.github.com/andrew84830813/selEnergyPermR. |
format | Online Article Text |
id | pubmed-8978828 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-89788282022-04-05 A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data Hinton, Andrew L. Mucha, Peter J. Front Microbiol Microbiology Numerous metagenomic studies aim to discover associations between the microbial composition of an environment (e.g., gut, skin, oral) and a phenotype of interest. Multivariate analysis is often performed in these studies without critical a priori knowledge of which taxa are associated with the phenotype being studied. This approach typically reduces statistical power in settings where the true associations among only a few taxa are obscured by high dimensionality (i.e., sparse association signals). At the same time, low sample size and compositional sample space constraints may reduce beyond-study generalizability if not properly accounted for. To address these difficulties, we developed the Selection-Energy-Permutation (SelEnergyPerm) method, a nonparametric group association test with embedded feature selection that directly accounts for compositional constraints using parsimonious logratio signatures between taxonomic features, for characterizing and understanding alterations in microbial community structure. Simulation results show SelEnergyPerm selects small independent sets of logratios that capture strong associations in a range of scenarios. Additionally, our simulation results demonstrate SelEnergyPerm consistently detects/rejects associations in synthetic data with sparse, dense, or no association signals. We demonstrate the novel benefits of our method in four case studies utilizing publicly available 16S amplicon and whole-genome sequencing datasets. Our R implementation of Selection-Energy-Permutation, including an example demonstration and the code to generate all of the scenarios used here, is available at https://www.github.com/andrew84830813/selEnergyPermR. Frontiers Media S.A. 2022-03-21 /pmc/articles/PMC8978828/ /pubmed/35387076 http://dx.doi.org/10.3389/fmicb.2022.837396 Text en Copyright © 2022 Hinton and Mucha. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Hinton, Andrew L. Mucha, Peter J. A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data |
title | A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data |
title_full | A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data |
title_fullStr | A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data |
title_full_unstemmed | A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data |
title_short | A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data |
title_sort | simultaneous feature selection and compositional association test for detecting sparse associations in high-dimensional metagenomic data |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8978828/ https://www.ncbi.nlm.nih.gov/pubmed/35387076 http://dx.doi.org/10.3389/fmicb.2022.837396 |
work_keys_str_mv | AT hintonandrewl asimultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata AT muchapeterj asimultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata AT hintonandrewl simultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata AT muchapeterj simultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata |