Cargando…
Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9566114/ https://www.ncbi.nlm.nih.gov/pubmed/36231678 http://dx.doi.org/10.3390/ijerph191912378 |
_version_ | 1784809062108495872 |
---|---|
author | Dutta, Aishwariya Hasan, Md. Kamrul Ahmad, Mohiuddin Awal, Md. Abdul Islam, Md. Akhtarul Masud, Mehedi Meshref, Hossam |
author_facet | Dutta, Aishwariya Hasan, Md. Kamrul Ahmad, Mohiuddin Awal, Md. Abdul Islam, Md. Akhtarul Masud, Mehedi Meshref, Hossam |
author_sort | Dutta, Aishwariya |
collection | PubMed |
description | Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of [Formula: see text] and an area under the ROC curve (AUC) of [Formula: see text]. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data. |
format | Online Article Text |
id | pubmed-9566114 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95661142022-10-15 Early Prediction of Diabetes Using an Ensemble of Machine Learning Models Dutta, Aishwariya Hasan, Md. Kamrul Ahmad, Mohiuddin Awal, Md. Abdul Islam, Md. Akhtarul Masud, Mehedi Meshref, Hossam Int J Environ Res Public Health Article Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of [Formula: see text] and an area under the ROC curve (AUC) of [Formula: see text]. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data. MDPI 2022-09-28 /pmc/articles/PMC9566114/ /pubmed/36231678 http://dx.doi.org/10.3390/ijerph191912378 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Dutta, Aishwariya Hasan, Md. Kamrul Ahmad, Mohiuddin Awal, Md. Abdul Islam, Md. Akhtarul Masud, Mehedi Meshref, Hossam Early Prediction of Diabetes Using an Ensemble of Machine Learning Models |
title | Early Prediction of Diabetes Using an Ensemble of Machine Learning Models |
title_full | Early Prediction of Diabetes Using an Ensemble of Machine Learning Models |
title_fullStr | Early Prediction of Diabetes Using an Ensemble of Machine Learning Models |
title_full_unstemmed | Early Prediction of Diabetes Using an Ensemble of Machine Learning Models |
title_short | Early Prediction of Diabetes Using an Ensemble of Machine Learning Models |
title_sort | early prediction of diabetes using an ensemble of machine learning models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9566114/ https://www.ncbi.nlm.nih.gov/pubmed/36231678 http://dx.doi.org/10.3390/ijerph191912378 |
work_keys_str_mv | AT duttaaishwariya earlypredictionofdiabetesusinganensembleofmachinelearningmodels AT hasanmdkamrul earlypredictionofdiabetesusinganensembleofmachinelearningmodels AT ahmadmohiuddin earlypredictionofdiabetesusinganensembleofmachinelearningmodels AT awalmdabdul earlypredictionofdiabetesusinganensembleofmachinelearningmodels AT islammdakhtarul earlypredictionofdiabetesusinganensembleofmachinelearningmodels AT masudmehedi earlypredictionofdiabetesusinganensembleofmachinelearningmodels AT meshrefhossam earlypredictionofdiabetesusinganensembleofmachinelearningmodels |