Cargando…
Predictive analysis methods for human microbiome data with application to Parkinson’s disease
Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statisti...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7446854/ https://www.ncbi.nlm.nih.gov/pubmed/32834004 http://dx.doi.org/10.1371/journal.pone.0237779 |
_version_ | 1783574202216873984 |
---|---|
author | Dong, Mei Li, Longhai Chen, Man Kusalik, Anthony Xu, Wei |
author_facet | Dong, Mei Li, Longhai Chen, Man Kusalik, Anthony Xu, Wei |
author_sort | Dong, Mei |
collection | PubMed |
description | Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson’s disease and the gut microbiome. |
format | Online Article Text |
id | pubmed-7446854 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-74468542020-08-26 Predictive analysis methods for human microbiome data with application to Parkinson’s disease Dong, Mei Li, Longhai Chen, Man Kusalik, Anthony Xu, Wei PLoS One Research Article Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson’s disease and the gut microbiome. Public Library of Science 2020-08-24 /pmc/articles/PMC7446854/ /pubmed/32834004 http://dx.doi.org/10.1371/journal.pone.0237779 Text en © 2020 Dong et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Dong, Mei Li, Longhai Chen, Man Kusalik, Anthony Xu, Wei Predictive analysis methods for human microbiome data with application to Parkinson’s disease |
title | Predictive analysis methods for human microbiome data with application to Parkinson’s disease |
title_full | Predictive analysis methods for human microbiome data with application to Parkinson’s disease |
title_fullStr | Predictive analysis methods for human microbiome data with application to Parkinson’s disease |
title_full_unstemmed | Predictive analysis methods for human microbiome data with application to Parkinson’s disease |
title_short | Predictive analysis methods for human microbiome data with application to Parkinson’s disease |
title_sort | predictive analysis methods for human microbiome data with application to parkinson’s disease |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7446854/ https://www.ncbi.nlm.nih.gov/pubmed/32834004 http://dx.doi.org/10.1371/journal.pone.0237779 |
work_keys_str_mv | AT dongmei predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease AT lilonghai predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease AT chenman predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease AT kusalikanthony predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease AT xuwei predictiveanalysismethodsforhumanmicrobiomedatawithapplicationtoparkinsonsdisease |