Cargando…

LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control

Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zer...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Yingtian, Satten, Glen A., Hu, Yi-Juan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9335309/
https://www.ncbi.nlm.nih.gov/pubmed/35867822
http://dx.doi.org/10.1073/pnas.2122788119
_version_ 1784759309294370816
author Hu, Yingtian
Satten, Glen A.
Hu, Yi-Juan
author_facet Hu, Yingtian
Satten, Glen A.
Hu, Yi-Juan
author_sort Hu, Yingtian
collection PubMed
description Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available.
format Online
Article
Text
id pubmed-9335309
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-93353092023-01-22 LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control Hu, Yingtian Satten, Glen A. Hu, Yi-Juan Proc Natl Acad Sci U S A Physical Sciences Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available. National Academy of Sciences 2022-07-22 2022-07-26 /pmc/articles/PMC9335309/ /pubmed/35867822 http://dx.doi.org/10.1073/pnas.2122788119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Hu, Yingtian
Satten, Glen A.
Hu, Yi-Juan
LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
title LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
title_full LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
title_fullStr LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
title_full_unstemmed LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
title_short LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
title_sort locom: a logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9335309/
https://www.ncbi.nlm.nih.gov/pubmed/35867822
http://dx.doi.org/10.1073/pnas.2122788119
work_keys_str_mv AT huyingtian locomalogisticregressionmodelfortestingdifferentialabundanceincompositionalmicrobiomedatawithfalsediscoveryratecontrol
AT sattenglena locomalogisticregressionmodelfortestingdifferentialabundanceincompositionalmicrobiomedatawithfalsediscoveryratecontrol
AT huyijuan locomalogisticregressionmodelfortestingdifferentialabundanceincompositionalmicrobiomedatawithfalsediscoveryratecontrol