Cargando…

Early prediction of diabetes by applying data mining techniques: A retrospective cohort study

Saudi Arabia ranks 7th globally in terms of diabetes prevalence, and its prevalence is expected to reach 45.36% by 2030. The cost of diabetes is expected to increase to 27 billion Saudi riyals in cases where undiagnosed individuals are also documented. Prevention and early detection can effectively...

Descripción completa

Detalles Bibliográficos
Autores principales: Al Yousef, Mohammed Zeyad, Yasky, Adel Fouad, Al Shammari, Riyad, Ferwana, Mazen S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Lippincott Williams & Wilkins 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302319/
https://www.ncbi.nlm.nih.gov/pubmed/35866773
http://dx.doi.org/10.1097/MD.0000000000029588
Descripción
Sumario:Saudi Arabia ranks 7th globally in terms of diabetes prevalence, and its prevalence is expected to reach 45.36% by 2030. The cost of diabetes is expected to increase to 27 billion Saudi riyals in cases where undiagnosed individuals are also documented. Prevention and early detection can effectively address these challenges. OBJECTIVE: To improve healthcare services and assist in building predictive models to estimate the probability of diabetes in patients. METHODS: A chart review, which was a retrospective cohort study, was conducted at the National Guard Health Affairs in Riyadh, Saudi Arabia. Data were collected from 5 hospitals using National Guard Health Affairs databases. We used 38 attributes of 21431 patients between 2015 and 2019. The following phases were performed: (1) data collection, (2) data preparation, (3) data mining and model building, and (4) model evaluation and validation. Subsequently, 6 algorithms were compared with and without the synthetic minority oversampling technique. RESULTS: The highest performance was found in the Bayesian network, which had an area under the curve of 0.75 and 0.71. CONCLUSION: Although the results were acceptable, they could be improved. In this context, missing data owing to technical issues played a major role in affecting the performance of our model. Nevertheless, the model could be used in prevention, health monitoring programs, and as an automated mass population screening tool without the need for extra costs compared to traditional methods.