Cargando…

A Machine Learning Approach to Predicting Diabetes Complications

Diabetes mellitus (DM) is a chronic disease that is considered to be life-threatening. It can affect any part of the body over time, resulting in serious complications such as nephropathy, neuropathy, and retinopathy. In this work, several supervised classification algorithms were applied for buildi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jian, Yazan, Pasquier, Michel, Sagahyroon, Assim, Aloul, Fadi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8702133/
https://www.ncbi.nlm.nih.gov/pubmed/34946438
http://dx.doi.org/10.3390/healthcare9121712
Descripción
Sumario:Diabetes mellitus (DM) is a chronic disease that is considered to be life-threatening. It can affect any part of the body over time, resulting in serious complications such as nephropathy, neuropathy, and retinopathy. In this work, several supervised classification algorithms were applied for building different models to predict and classify eight diabetes complications. The complications include metabolic syndrome, dyslipidemia, neuropathy, nephropathy, diabetic foot, hypertension, obesity, and retinopathy. For this study, a dataset collected by the Rashid Center for Diabetes and Research (RCDR) located in Ajman, UAE, was utilized. The dataset consists of 884 records with 79 features. Some essential preprocessing steps were applied to handle the missing values and unbalanced data problems. Furthermore, feature selection was performed to select the top five and ten features for each complication. The final number of records used to train and build the binary classifiers for each complication was as follows: 428—metabolic syndrome, 836—dyslipidemia, 223—neuropathy, 233—nephropathy, 240—diabetic foot, 586—hypertension, 498—obesity, 228—retinopathy. Repeated stratified k-fold cross-validation (with k = 10 and a total of 10 repetitions) was employed for a better estimation of the performance. Accuracy and F1-score were used to evaluate the models’ performance reaching a maximum of 97.8% and 97.7% for accuracy and F1-scores, respectively. Moreover, by comparing the performance achieved using different attributes’ sets, it was found that by using a selected number of features, we can still build adequate classifiers.