Cargando…

Early Prediction of Diabetes Using an Ensemble of Machine Learning Models

Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes...

Descripción completa

Detalles Bibliográficos
Autores principales: Dutta, Aishwariya, Hasan, Md. Kamrul, Ahmad, Mohiuddin, Awal, Md. Abdul, Islam, Md. Akhtarul, Masud, Mehedi, Meshref, Hossam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9566114/
https://www.ncbi.nlm.nih.gov/pubmed/36231678
http://dx.doi.org/10.3390/ijerph191912378
_version_ 1784809062108495872
author Dutta, Aishwariya
Hasan, Md. Kamrul
Ahmad, Mohiuddin
Awal, Md. Abdul
Islam, Md. Akhtarul
Masud, Mehedi
Meshref, Hossam
author_facet Dutta, Aishwariya
Hasan, Md. Kamrul
Ahmad, Mohiuddin
Awal, Md. Abdul
Islam, Md. Akhtarul
Masud, Mehedi
Meshref, Hossam
author_sort Dutta, Aishwariya
collection PubMed
description Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of [Formula: see text] and an area under the ROC curve (AUC) of [Formula: see text]. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data.
format Online
Article
Text
id pubmed-9566114
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95661142022-10-15 Early Prediction of Diabetes Using an Ensemble of Machine Learning Models Dutta, Aishwariya Hasan, Md. Kamrul Ahmad, Mohiuddin Awal, Md. Abdul Islam, Md. Akhtarul Masud, Mehedi Meshref, Hossam Int J Environ Res Public Health Article Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of [Formula: see text] and an area under the ROC curve (AUC) of [Formula: see text]. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data. MDPI 2022-09-28 /pmc/articles/PMC9566114/ /pubmed/36231678 http://dx.doi.org/10.3390/ijerph191912378 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dutta, Aishwariya
Hasan, Md. Kamrul
Ahmad, Mohiuddin
Awal, Md. Abdul
Islam, Md. Akhtarul
Masud, Mehedi
Meshref, Hossam
Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
title Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
title_full Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
title_fullStr Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
title_full_unstemmed Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
title_short Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
title_sort early prediction of diabetes using an ensemble of machine learning models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9566114/
https://www.ncbi.nlm.nih.gov/pubmed/36231678
http://dx.doi.org/10.3390/ijerph191912378
work_keys_str_mv AT duttaaishwariya earlypredictionofdiabetesusinganensembleofmachinelearningmodels
AT hasanmdkamrul earlypredictionofdiabetesusinganensembleofmachinelearningmodels
AT ahmadmohiuddin earlypredictionofdiabetesusinganensembleofmachinelearningmodels
AT awalmdabdul earlypredictionofdiabetesusinganensembleofmachinelearningmodels
AT islammdakhtarul earlypredictionofdiabetesusinganensembleofmachinelearningmodels
AT masudmehedi earlypredictionofdiabetesusinganensembleofmachinelearningmodels
AT meshrefhossam earlypredictionofdiabetesusinganensembleofmachinelearningmodels