Cargando…

A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies

Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Koh, Hyunwook, Li, Yutong, Zhan, Xiang, Chen, Jun, Zhao, Ni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6532659/
https://www.ncbi.nlm.nih.gov/pubmed/31156711
http://dx.doi.org/10.3389/fgene.2019.00458
_version_ 1783421058144010240
author Koh, Hyunwook
Li, Yutong
Zhan, Xiang
Chen, Jun
Zhao, Ni
author_facet Koh, Hyunwook
Li, Yutong
Zhan, Xiang
Chen, Jun
Zhao, Ni
author_sort Koh, Hyunwook
collection PubMed
description Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.
format Online
Article
Text
id pubmed-6532659
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-65326592019-05-31 A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies Koh, Hyunwook Li, Yutong Zhan, Xiang Chen, Jun Zhao, Ni Front Genet Genetics Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference. Frontiers Media S.A. 2019-05-16 /pmc/articles/PMC6532659/ /pubmed/31156711 http://dx.doi.org/10.3389/fgene.2019.00458 Text en Copyright © 2019 Koh, Li, Zhan, Chen and Zhao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Koh, Hyunwook
Li, Yutong
Zhan, Xiang
Chen, Jun
Zhao, Ni
A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies
title A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies
title_full A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies
title_fullStr A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies
title_full_unstemmed A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies
title_short A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies
title_sort distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6532659/
https://www.ncbi.nlm.nih.gov/pubmed/31156711
http://dx.doi.org/10.3389/fgene.2019.00458
work_keys_str_mv AT kohhyunwook adistancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT liyutong adistancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT zhanxiang adistancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT chenjun adistancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT zhaoni adistancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT kohhyunwook distancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT liyutong distancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT zhanxiang distancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT chenjun distancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies
AT zhaoni distancebasedkernelassociationtestbasedonthegeneralizedlinearmixedmodelforcorrelatedmicrobiomestudies