Cargando…

Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population

OBJECTIVES: This study developed and compared the performance of three widely used predictive models—logistic regression (LR), artificial neural network (ANN), and decision tree (DT)—to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Niger...

Descripción completa

Detalles Bibliográficos
Autores principales: Odukoya, Oluwakemi, Nwaneri, Solomon, Odeniyi, Ifedayo, Akodu, Babatunde, Oluwole, Esther, Olorunfemi, Gbenga, Popoola, Oluwatoyin, Osuntoki, Akinniyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korean Society of Medical Informatics 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8850175/
https://www.ncbi.nlm.nih.gov/pubmed/35172091
http://dx.doi.org/10.4258/hir.2022.28.1.58
_version_ 1784652535020126208
author Odukoya, Oluwakemi
Nwaneri, Solomon
Odeniyi, Ifedayo
Akodu, Babatunde
Oluwole, Esther
Olorunfemi, Gbenga
Popoola, Oluwatoyin
Osuntoki, Akinniyi
author_facet Odukoya, Oluwakemi
Nwaneri, Solomon
Odeniyi, Ifedayo
Akodu, Babatunde
Oluwole, Esther
Olorunfemi, Gbenga
Popoola, Oluwatoyin
Osuntoki, Akinniyi
author_sort Odukoya, Oluwakemi
collection PubMed
description OBJECTIVES: This study developed and compared the performance of three widely used predictive models—logistic regression (LR), artificial neural network (ANN), and decision tree (DT)—to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians. METHODS: We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1. RESULTS: The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892). CONCLUSIONS: Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN.
format Online
Article
Text
id pubmed-8850175
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Korean Society of Medical Informatics
record_format MEDLINE/PubMed
spelling pubmed-88501752022-02-26 Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population Odukoya, Oluwakemi Nwaneri, Solomon Odeniyi, Ifedayo Akodu, Babatunde Oluwole, Esther Olorunfemi, Gbenga Popoola, Oluwatoyin Osuntoki, Akinniyi Healthc Inform Res Original Article OBJECTIVES: This study developed and compared the performance of three widely used predictive models—logistic regression (LR), artificial neural network (ANN), and decision tree (DT)—to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians. METHODS: We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1. RESULTS: The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892). CONCLUSIONS: Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN. Korean Society of Medical Informatics 2022-01 2022-01-31 /pmc/articles/PMC8850175/ /pubmed/35172091 http://dx.doi.org/10.4258/hir.2022.28.1.58 Text en © 2022 The Korean Society of Medical Informatics https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Odukoya, Oluwakemi
Nwaneri, Solomon
Odeniyi, Ifedayo
Akodu, Babatunde
Oluwole, Esther
Olorunfemi, Gbenga
Popoola, Oluwatoyin
Osuntoki, Akinniyi
Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
title Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
title_full Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
title_fullStr Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
title_full_unstemmed Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
title_short Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
title_sort development and comparison of three data models for predicting diabetes mellitus using risk factors in a nigerian population
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8850175/
https://www.ncbi.nlm.nih.gov/pubmed/35172091
http://dx.doi.org/10.4258/hir.2022.28.1.58
work_keys_str_mv AT odukoyaoluwakemi developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT nwanerisolomon developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT odeniyiifedayo developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT akodubabatunde developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT oluwoleesther developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT olorunfemigbenga developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT popoolaoluwatoyin developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation
AT osuntokiakinniyi developmentandcomparisonofthreedatamodelsforpredictingdiabetesmellitususingriskfactorsinanigerianpopulation