Cargando…

Predictive analysis methods for human microbiome data with application to Parkinson’s disease

Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statisti...

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Mei, Li, Longhai, Chen, Man, Kusalik, Anthony, Xu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7446854/
https://www.ncbi.nlm.nih.gov/pubmed/32834004
http://dx.doi.org/10.1371/journal.pone.0237779
_version_ 1783574202216873984
author Dong, Mei
Li, Longhai
Chen, Man
Kusalik, Anthony
Xu, Wei
author_facet Dong, Mei
Li, Longhai
Chen, Man
Kusalik, Anthony
Xu, Wei
author_sort Dong, Mei
collection PubMed
description Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson’s disease and the gut microbiome.
format Online
Article
Text
id pubmed-7446854
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-74468542020-08-26 Predictive analysis methods for human microbiome data with application to Parkinson’s disease Dong, Mei Li, Longhai Chen, Man Kusalik, Anthony Xu, Wei PLoS One Research Article Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson’s disease and the gut microbiome. Public Library of Science 2020-08-24 /pmc/articles/PMC7446854/ /pubmed/32834004 http://dx.doi.org/10.1371/journal.pone.0237779 Text en © 2020 Dong et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Dong, Mei
Li, Longhai
Chen, Man
Kusalik, Anthony
Xu, Wei
Predictive analysis methods for human microbiome data with application to Parkinson’s disease
title Predictive analysis methods for human microbiome data with application to Parkinson’s disease
title_full Predictive analysis methods for human microbiome data with application to Parkinson’s disease
title_fullStr Predictive analysis methods for human microbiome data with application to Parkinson’s disease
title_full_unstemmed Predictive analysis methods for human microbiome data with application to Parkinson’s disease
title_short Predictive analysis methods for human microbiome data with application to Parkinson’s disease
title_sort predictive analysis methods for human microbiome data with application to parkinson’s disease
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7446854/
https://www.ncbi.nlm.nih.gov/pubmed/32834004
http://dx.doi.org/10.1371/journal.pone.0237779
work_keys_str_mv AT dongmei predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease
AT lilonghai predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease
AT chenman predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease
AT kusalikanthony predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease
AT xuwei predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease