Cargando…

Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome

Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In t...

Descripción completa

Detalles Bibliográficos
Autores principales: Unal, Metehan, Bostanci, Erkan, Ozkul, Ceren, Acici, Koray, Asuroglu, Tunc, Guzel, Mehmet Serdar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10486516/
https://www.ncbi.nlm.nih.gov/pubmed/37685376
http://dx.doi.org/10.3390/diagnostics13172835
_version_ 1785103025048649728
author Unal, Metehan
Bostanci, Erkan
Ozkul, Ceren
Acici, Koray
Asuroglu, Tunc
Guzel, Mehmet Serdar
author_facet Unal, Metehan
Bostanci, Erkan
Ozkul, Ceren
Acici, Koray
Asuroglu, Tunc
Guzel, Mehmet Serdar
author_sort Unal, Metehan
collection PubMed
description Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In this study, we employed several modern Machine Learning techniques to predict Inflammatory Bowel Disease using raw sequence data. The dataset was obtained from NCBI preprocessed graph representations and converted into a structured form. Seven well-known Machine Learning frameworks, including Random Forest, Support Vector Machines, Extreme Gradient Boosting, Light Gradient Boosting Machine, Gaussian Naïve Bayes, Logistic Regression, and k-Nearest Neighbor, were used. Grid Search was employed for hyperparameter optimization. The performance of the Machine Learning models was evaluated using various metrics such as accuracy, precision, fscore, kappa, and area under the receiver operating characteristic curve. Additionally, Mc Nemar’s test was conducted to assess the statistical significance of the experiment. The data was constructed using k-mer lengths of 3, 4 and 5. The Light Gradient Boosting Machine model overperformed over other models with 67.24%, 74.63% and 76.47% accuracy for k-mer lengths of 3, 4 and 5, respectively. The LightGBM model also demonstrated the best performance in each metric. The study showed promising results predicting disease from raw sequence data. Finally, Mc Nemar’s test results found statistically significant differences between different Machine Learning approaches.
format Online
Article
Text
id pubmed-10486516
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104865162023-09-09 Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome Unal, Metehan Bostanci, Erkan Ozkul, Ceren Acici, Koray Asuroglu, Tunc Guzel, Mehmet Serdar Diagnostics (Basel) Article Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In this study, we employed several modern Machine Learning techniques to predict Inflammatory Bowel Disease using raw sequence data. The dataset was obtained from NCBI preprocessed graph representations and converted into a structured form. Seven well-known Machine Learning frameworks, including Random Forest, Support Vector Machines, Extreme Gradient Boosting, Light Gradient Boosting Machine, Gaussian Naïve Bayes, Logistic Regression, and k-Nearest Neighbor, were used. Grid Search was employed for hyperparameter optimization. The performance of the Machine Learning models was evaluated using various metrics such as accuracy, precision, fscore, kappa, and area under the receiver operating characteristic curve. Additionally, Mc Nemar’s test was conducted to assess the statistical significance of the experiment. The data was constructed using k-mer lengths of 3, 4 and 5. The Light Gradient Boosting Machine model overperformed over other models with 67.24%, 74.63% and 76.47% accuracy for k-mer lengths of 3, 4 and 5, respectively. The LightGBM model also demonstrated the best performance in each metric. The study showed promising results predicting disease from raw sequence data. Finally, Mc Nemar’s test results found statistically significant differences between different Machine Learning approaches. MDPI 2023-09-01 /pmc/articles/PMC10486516/ /pubmed/37685376 http://dx.doi.org/10.3390/diagnostics13172835 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Unal, Metehan
Bostanci, Erkan
Ozkul, Ceren
Acici, Koray
Asuroglu, Tunc
Guzel, Mehmet Serdar
Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
title Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
title_full Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
title_fullStr Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
title_full_unstemmed Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
title_short Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
title_sort crohn’s disease prediction using sequence based machine learning analysis of human microbiome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10486516/
https://www.ncbi.nlm.nih.gov/pubmed/37685376
http://dx.doi.org/10.3390/diagnostics13172835
work_keys_str_mv AT unalmetehan crohnsdiseasepredictionusingsequencebasedmachinelearninganalysisofhumanmicrobiome
AT bostancierkan crohnsdiseasepredictionusingsequencebasedmachinelearninganalysisofhumanmicrobiome
AT ozkulceren crohnsdiseasepredictionusingsequencebasedmachinelearninganalysisofhumanmicrobiome
AT acicikoray crohnsdiseasepredictionusingsequencebasedmachinelearninganalysisofhumanmicrobiome
AT asuroglutunc crohnsdiseasepredictionusingsequencebasedmachinelearninganalysisofhumanmicrobiome
AT guzelmehmetserdar crohnsdiseasepredictionusingsequencebasedmachinelearninganalysisofhumanmicrobiome