Cargando…

A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data

Numerous metagenomic studies aim to discover associations between the microbial composition of an environment (e.g., gut, skin, oral) and a phenotype of interest. Multivariate analysis is often performed in these studies without critical a priori knowledge of which taxa are associated with the pheno...

Descripción completa

Detalles Bibliográficos
Autores principales: Hinton, Andrew L., Mucha, Peter J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8978828/
https://www.ncbi.nlm.nih.gov/pubmed/35387076
http://dx.doi.org/10.3389/fmicb.2022.837396
_version_ 1784681039842508800
author Hinton, Andrew L.
Mucha, Peter J.
author_facet Hinton, Andrew L.
Mucha, Peter J.
author_sort Hinton, Andrew L.
collection PubMed
description Numerous metagenomic studies aim to discover associations between the microbial composition of an environment (e.g., gut, skin, oral) and a phenotype of interest. Multivariate analysis is often performed in these studies without critical a priori knowledge of which taxa are associated with the phenotype being studied. This approach typically reduces statistical power in settings where the true associations among only a few taxa are obscured by high dimensionality (i.e., sparse association signals). At the same time, low sample size and compositional sample space constraints may reduce beyond-study generalizability if not properly accounted for. To address these difficulties, we developed the Selection-Energy-Permutation (SelEnergyPerm) method, a nonparametric group association test with embedded feature selection that directly accounts for compositional constraints using parsimonious logratio signatures between taxonomic features, for characterizing and understanding alterations in microbial community structure. Simulation results show SelEnergyPerm selects small independent sets of logratios that capture strong associations in a range of scenarios. Additionally, our simulation results demonstrate SelEnergyPerm consistently detects/rejects associations in synthetic data with sparse, dense, or no association signals. We demonstrate the novel benefits of our method in four case studies utilizing publicly available 16S amplicon and whole-genome sequencing datasets. Our R implementation of Selection-Energy-Permutation, including an example demonstration and the code to generate all of the scenarios used here, is available at https://www.github.com/andrew84830813/selEnergyPermR.
format Online
Article
Text
id pubmed-8978828
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-89788282022-04-05 A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data Hinton, Andrew L. Mucha, Peter J. Front Microbiol Microbiology Numerous metagenomic studies aim to discover associations between the microbial composition of an environment (e.g., gut, skin, oral) and a phenotype of interest. Multivariate analysis is often performed in these studies without critical a priori knowledge of which taxa are associated with the phenotype being studied. This approach typically reduces statistical power in settings where the true associations among only a few taxa are obscured by high dimensionality (i.e., sparse association signals). At the same time, low sample size and compositional sample space constraints may reduce beyond-study generalizability if not properly accounted for. To address these difficulties, we developed the Selection-Energy-Permutation (SelEnergyPerm) method, a nonparametric group association test with embedded feature selection that directly accounts for compositional constraints using parsimonious logratio signatures between taxonomic features, for characterizing and understanding alterations in microbial community structure. Simulation results show SelEnergyPerm selects small independent sets of logratios that capture strong associations in a range of scenarios. Additionally, our simulation results demonstrate SelEnergyPerm consistently detects/rejects associations in synthetic data with sparse, dense, or no association signals. We demonstrate the novel benefits of our method in four case studies utilizing publicly available 16S amplicon and whole-genome sequencing datasets. Our R implementation of Selection-Energy-Permutation, including an example demonstration and the code to generate all of the scenarios used here, is available at https://www.github.com/andrew84830813/selEnergyPermR. Frontiers Media S.A. 2022-03-21 /pmc/articles/PMC8978828/ /pubmed/35387076 http://dx.doi.org/10.3389/fmicb.2022.837396 Text en Copyright © 2022 Hinton and Mucha. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Hinton, Andrew L.
Mucha, Peter J.
A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
title A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
title_full A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
title_fullStr A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
title_full_unstemmed A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
title_short A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data
title_sort simultaneous feature selection and compositional association test for detecting sparse associations in high-dimensional metagenomic data
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8978828/
https://www.ncbi.nlm.nih.gov/pubmed/35387076
http://dx.doi.org/10.3389/fmicb.2022.837396
work_keys_str_mv AT hintonandrewl asimultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata
AT muchapeterj asimultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata
AT hintonandrewl simultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata
AT muchapeterj simultaneousfeatureselectionandcompositionalassociationtestfordetectingsparseassociationsinhighdimensionalmetagenomicdata