Cargando…

Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data

BACKGROUND: For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly...

Descripción completa

Detalles Bibliográficos
Autores principales: Bai, Wei, Dong, Mei, Li, Longhai, Feng, Cindy, Xu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8620156/
https://www.ncbi.nlm.nih.gov/pubmed/34823466
http://dx.doi.org/10.1186/s12859-021-04371-6
_version_ 1784605154091204608
author Bai, Wei
Dong, Mei
Li, Longhai
Feng, Cindy
Xu, Wei
author_facet Bai, Wei
Dong, Mei
Li, Longhai
Feng, Cindy
Xu, Wei
author_sort Bai, Wei
collection PubMed
description BACKGROUND: For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demands that the postulated model fit the count data adequately. Mis-specification of the distribution of the count data may lead to excess false discoveries. Therefore, model checking is critical to control the FDR at a nominal level in differential abundance analysis. Increasing studies show that the method of randomized quantile residual (RQR) performs well in diagnosing count regression models. However, the performance of RQR in diagnosing zero-inflated GLMMs for sequencing count data has not been extensively investigated in the literature. RESULTS: We conduct large-scale simulation studies to investigate the performance of the RQRs for zero-inflated GLMMs. The simulation studies show that the type I error rates of the GOF tests with RQRs are very close to the nominal level; in addition, the scatter-plots and Q–Q plots of RQRs are useful in discerning the good and bad models. We also apply the RQRs to diagnose six GLMMs to a real microbiome dataset. The results show that the OTU counts at the genus level of this dataset (after a truncation treatment) can be modelled well by zero-inflated and zero-modified NB models. CONCLUSION: RQR is an excellent tool for diagnosing GLMMs for zero-inflated count data, particularly the sequencing count data arising in microbiome studies. In the supplementary materials, we provided two generic R functions, called rqr.glmmtmb and rqr.hurdle.glmmtmb, for calculating the RQRs given fitting outputs of the R package glmmTMB. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04371-6.
format Online
Article
Text
id pubmed-8620156
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86201562021-11-29 Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data Bai, Wei Dong, Mei Li, Longhai Feng, Cindy Xu, Wei BMC Bioinformatics Research BACKGROUND: For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demands that the postulated model fit the count data adequately. Mis-specification of the distribution of the count data may lead to excess false discoveries. Therefore, model checking is critical to control the FDR at a nominal level in differential abundance analysis. Increasing studies show that the method of randomized quantile residual (RQR) performs well in diagnosing count regression models. However, the performance of RQR in diagnosing zero-inflated GLMMs for sequencing count data has not been extensively investigated in the literature. RESULTS: We conduct large-scale simulation studies to investigate the performance of the RQRs for zero-inflated GLMMs. The simulation studies show that the type I error rates of the GOF tests with RQRs are very close to the nominal level; in addition, the scatter-plots and Q–Q plots of RQRs are useful in discerning the good and bad models. We also apply the RQRs to diagnose six GLMMs to a real microbiome dataset. The results show that the OTU counts at the genus level of this dataset (after a truncation treatment) can be modelled well by zero-inflated and zero-modified NB models. CONCLUSION: RQR is an excellent tool for diagnosing GLMMs for zero-inflated count data, particularly the sequencing count data arising in microbiome studies. In the supplementary materials, we provided two generic R functions, called rqr.glmmtmb and rqr.hurdle.glmmtmb, for calculating the RQRs given fitting outputs of the R package glmmTMB. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04371-6. BioMed Central 2021-11-25 /pmc/articles/PMC8620156/ /pubmed/34823466 http://dx.doi.org/10.1186/s12859-021-04371-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Bai, Wei
Dong, Mei
Li, Longhai
Feng, Cindy
Xu, Wei
Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_full Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_fullStr Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_full_unstemmed Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_short Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_sort randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8620156/
https://www.ncbi.nlm.nih.gov/pubmed/34823466
http://dx.doi.org/10.1186/s12859-021-04371-6
work_keys_str_mv AT baiwei randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT dongmei randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT lilonghai randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT fengcindy randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT xuwei randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata