Cargando…

Prediction of diabetes disease using an ensemble of machine learning multi-classifier models

BACKGROUND AND OBJECTIVE: Diabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled...

Descripción completa

Detalles Bibliográficos
Autores principales: Abnoosian, Karlo, Farnoosh, Rahman, Behzadi, Mohammad Hassan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10496262/
https://www.ncbi.nlm.nih.gov/pubmed/37697283
http://dx.doi.org/10.1186/s12859-023-05465-z
_version_ 1785105070733393920
author Abnoosian, Karlo
Farnoosh, Rahman
Behzadi, Mohammad Hassan
author_facet Abnoosian, Karlo
Farnoosh, Rahman
Behzadi, Mohammad Hassan
author_sort Abnoosian, Karlo
collection PubMed
description BACKGROUND AND OBJECTIVE: Diabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled data, frequent missing values, and dataset imbalance hinder the development of accurate prediction models. Therefore, a novel framework is required to address these challenges and improve performance. METHODS: In this study, we propose an innovative pipeline-based multi-classification framework to predict diabetes in three classes: diabetic, non-diabetic, and prediabetes, using the imbalanced Iraqi Patient Dataset of Diabetes. Our framework incorporates various pre-processing techniques, including duplicate sample removal, attribute conversion, missing value imputation, data normalization and standardization, feature selection, and k-fold cross-validation. Furthermore, we implement multiple machine learning models, such as k-NN, SVM, DT, RF, AdaBoost, and GNB, and introduce a weighted ensemble approach based on the Area Under the Receiver Operating Characteristic Curve (AUC) to address dataset imbalance. Performance optimization is achieved through grid search and Bayesian optimization for hyper-parameter tuning. RESULTS: Our proposed model outperforms other machine learning models, including k-NN, SVM, DT, RF, AdaBoost, and GNB, in predicting diabetes. The model achieves high average accuracy, precision, recall, F1-score, and AUC values of 0.9887, 0.9861, 0.9792, 0.9851, and 0.999, respectively. CONCLUSION: Our pipeline-based multi-classification framework demonstrates promising results in accurately predicting diabetes using an imbalanced dataset of Iraqi diabetic patients. The proposed framework addresses the challenges associated with limited labeled data, missing values, and dataset imbalance, leading to improved prediction performance. This study highlights the potential of machine learning techniques in diabetes diagnosis and management, and the proposed framework can serve as a valuable tool for accurate prediction and improved patient care. Further research can build upon our work to refine and optimize the framework and explore its applicability in diverse datasets and populations. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05465-z.
format Online
Article
Text
id pubmed-10496262
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104962622023-09-13 Prediction of diabetes disease using an ensemble of machine learning multi-classifier models Abnoosian, Karlo Farnoosh, Rahman Behzadi, Mohammad Hassan BMC Bioinformatics Research BACKGROUND AND OBJECTIVE: Diabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled data, frequent missing values, and dataset imbalance hinder the development of accurate prediction models. Therefore, a novel framework is required to address these challenges and improve performance. METHODS: In this study, we propose an innovative pipeline-based multi-classification framework to predict diabetes in three classes: diabetic, non-diabetic, and prediabetes, using the imbalanced Iraqi Patient Dataset of Diabetes. Our framework incorporates various pre-processing techniques, including duplicate sample removal, attribute conversion, missing value imputation, data normalization and standardization, feature selection, and k-fold cross-validation. Furthermore, we implement multiple machine learning models, such as k-NN, SVM, DT, RF, AdaBoost, and GNB, and introduce a weighted ensemble approach based on the Area Under the Receiver Operating Characteristic Curve (AUC) to address dataset imbalance. Performance optimization is achieved through grid search and Bayesian optimization for hyper-parameter tuning. RESULTS: Our proposed model outperforms other machine learning models, including k-NN, SVM, DT, RF, AdaBoost, and GNB, in predicting diabetes. The model achieves high average accuracy, precision, recall, F1-score, and AUC values of 0.9887, 0.9861, 0.9792, 0.9851, and 0.999, respectively. CONCLUSION: Our pipeline-based multi-classification framework demonstrates promising results in accurately predicting diabetes using an imbalanced dataset of Iraqi diabetic patients. The proposed framework addresses the challenges associated with limited labeled data, missing values, and dataset imbalance, leading to improved prediction performance. This study highlights the potential of machine learning techniques in diabetes diagnosis and management, and the proposed framework can serve as a valuable tool for accurate prediction and improved patient care. Further research can build upon our work to refine and optimize the framework and explore its applicability in diverse datasets and populations. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05465-z. BioMed Central 2023-09-12 /pmc/articles/PMC10496262/ /pubmed/37697283 http://dx.doi.org/10.1186/s12859-023-05465-z Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Abnoosian, Karlo
Farnoosh, Rahman
Behzadi, Mohammad Hassan
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
title Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
title_full Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
title_fullStr Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
title_full_unstemmed Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
title_short Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
title_sort prediction of diabetes disease using an ensemble of machine learning multi-classifier models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10496262/
https://www.ncbi.nlm.nih.gov/pubmed/37697283
http://dx.doi.org/10.1186/s12859-023-05465-z
work_keys_str_mv AT abnoosiankarlo predictionofdiabetesdiseaseusinganensembleofmachinelearningmulticlassifiermodels
AT farnooshrahman predictionofdiabetesdiseaseusinganensembleofmachinelearningmulticlassifiermodels
AT behzadimohammadhassan predictionofdiabetesdiseaseusinganensembleofmachinelearningmulticlassifiermodels