Cargando…
LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control
Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zer...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9335309/ https://www.ncbi.nlm.nih.gov/pubmed/35867822 http://dx.doi.org/10.1073/pnas.2122788119 |
_version_ | 1784759309294370816 |
---|---|
author | Hu, Yingtian Satten, Glen A. Hu, Yi-Juan |
author_facet | Hu, Yingtian Satten, Glen A. Hu, Yi-Juan |
author_sort | Hu, Yingtian |
collection | PubMed |
description | Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available. |
format | Online Article Text |
id | pubmed-9335309 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-93353092023-01-22 LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control Hu, Yingtian Satten, Glen A. Hu, Yi-Juan Proc Natl Acad Sci U S A Physical Sciences Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available. National Academy of Sciences 2022-07-22 2022-07-26 /pmc/articles/PMC9335309/ /pubmed/35867822 http://dx.doi.org/10.1073/pnas.2122788119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Physical Sciences Hu, Yingtian Satten, Glen A. Hu, Yi-Juan LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
title | LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
title_full | LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
title_fullStr | LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
title_full_unstemmed | LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
title_short | LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
title_sort | locom: a logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9335309/ https://www.ncbi.nlm.nih.gov/pubmed/35867822 http://dx.doi.org/10.1073/pnas.2122788119 |
work_keys_str_mv | AT huyingtian locomalogisticregressionmodelfortestingdifferentialabundanceincompositionalmicrobiomedatawithfalsediscoveryratecontrol AT sattenglena locomalogisticregressionmodelfortestingdifferentialabundanceincompositionalmicrobiomedatawithfalsediscoveryratecontrol AT huyijuan locomalogisticregressionmodelfortestingdifferentialabundanceincompositionalmicrobiomedatawithfalsediscoveryratecontrol |