Cargando…

Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique

Diabetes is one of the most common and serious diseases affecting human health. Early diagnosis and treatment are vital to prevent or delay complications related to diabetes. An automated diabetes detection system assists physicians in the early diagnosis of the disease and reduces complications by...

Descripción completa

Detalles Bibliográficos
Autor principal: Gündoğdu, Serdar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10043839/
https://www.ncbi.nlm.nih.gov/pubmed/37362660
http://dx.doi.org/10.1007/s11042-023-15165-8
_version_ 1784913238670966784
author Gündoğdu, Serdar
author_facet Gündoğdu, Serdar
author_sort Gündoğdu, Serdar
collection PubMed
description Diabetes is one of the most common and serious diseases affecting human health. Early diagnosis and treatment are vital to prevent or delay complications related to diabetes. An automated diabetes detection system assists physicians in the early diagnosis of the disease and reduces complications by providing fast and precise results. This study aims to introduce a technique based on a combination of multiple linear regression (MLR), random forest (RF), and XGBoost (XG) to diagnose diabetes from questionnaire data. MLR-RF algorithm is used for feature selection, and XG is used for classification in the proposed system. The dataset is the diabetic hospital data in Sylhet, Bangladesh. It contains 520 instances, including 320 diabetics and 200 control instances. The performance of the classifiers is measured concerning accuracy (ACC), precision (PPV), recall (SEN, sensitivity), F1 score (F1), and the area under the receiver-operating-characteristic curve (AUC). The results show that the proposed system achieves an accuracy of 99.2%, an AUC of 99.3%, and a prediction time of 0.04825 seconds. The feature selection method improves the prediction time, although it does not affect the accuracy of the four compared classifiers. The results of this study are quite reasonable and successful when compared with other studies. The proposed method can be used as an auxiliary tool in diagnosing diabetes.
format Online
Article
Text
id pubmed-10043839
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-100438392023-03-28 Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique Gündoğdu, Serdar Multimed Tools Appl Article Diabetes is one of the most common and serious diseases affecting human health. Early diagnosis and treatment are vital to prevent or delay complications related to diabetes. An automated diabetes detection system assists physicians in the early diagnosis of the disease and reduces complications by providing fast and precise results. This study aims to introduce a technique based on a combination of multiple linear regression (MLR), random forest (RF), and XGBoost (XG) to diagnose diabetes from questionnaire data. MLR-RF algorithm is used for feature selection, and XG is used for classification in the proposed system. The dataset is the diabetic hospital data in Sylhet, Bangladesh. It contains 520 instances, including 320 diabetics and 200 control instances. The performance of the classifiers is measured concerning accuracy (ACC), precision (PPV), recall (SEN, sensitivity), F1 score (F1), and the area under the receiver-operating-characteristic curve (AUC). The results show that the proposed system achieves an accuracy of 99.2%, an AUC of 99.3%, and a prediction time of 0.04825 seconds. The feature selection method improves the prediction time, although it does not affect the accuracy of the four compared classifiers. The results of this study are quite reasonable and successful when compared with other studies. The proposed method can be used as an auxiliary tool in diagnosing diabetes. Springer US 2023-03-28 /pmc/articles/PMC10043839/ /pubmed/37362660 http://dx.doi.org/10.1007/s11042-023-15165-8 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Gündoğdu, Serdar
Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
title Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
title_full Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
title_fullStr Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
title_full_unstemmed Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
title_short Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
title_sort efficient prediction of early-stage diabetes using xgboost classifier with random forest feature selection technique
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10043839/
https://www.ncbi.nlm.nih.gov/pubmed/37362660
http://dx.doi.org/10.1007/s11042-023-15165-8
work_keys_str_mv AT gundogduserdar efficientpredictionofearlystagediabetesusingxgboostclassifierwithrandomforestfeatureselectiontechnique