Cargando…

Performance analysis of data mining algorithms for diagnosing COVID-19

BACKGROUND: An outbreak of atypical pneumonia termed COVID-19 has widely spread all over the world since the beginning of 2020. In this regard, designing a prediction system for the early detection of COVID-19 is a critical issue in mitigating virus spread. In this study, we have applied selected ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Nopour, Raoof, Kazemi-Arpanahi, Hadi, Shanbehzadeh, Mostafa, Azizifar, Akbar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wolters Kluwer - Medknow 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8719570/
https://www.ncbi.nlm.nih.gov/pubmed/35071611
http://dx.doi.org/10.4103/jehp.jehp_138_21
Descripción
Sumario:BACKGROUND: An outbreak of atypical pneumonia termed COVID-19 has widely spread all over the world since the beginning of 2020. In this regard, designing a prediction system for the early detection of COVID-19 is a critical issue in mitigating virus spread. In this study, we have applied selected machine learning techniques to select the best predictive models based on their performance. MATERIALS AND METHODS: The data of 435 suspicious cases with COVID-19 which were recorded from the Imam Khomeini Hospital database between May 9, 2020 and December 20, 2020, have been taken into consideration. The Chi-square method was used to determine the most important features in diagnosing the COVID-19; eight selected data mining algorithms including multilayer perceptron (MLP), J-48, Bayesian Net (Bayes Net), logistic regression, K-star, random forest, Ada-boost, and sequential minimal optimization (SMO) were applied in data mining. Finally, the most appropriate diagnostic model for COVID-19 was obtained based on comparing the performance of the selected algorithms. RESULTS: As the result of using the Chi-square method, 21 variables were identified as the most important diagnostic criteria in COVID-19. The results of evaluating the eight selected data mining algorithms showed that the J-48 with true-positive rate = 0.85, false-positive rate = 0.173, precision = 0.85, recall = 0.85, F-score = 0.85, Matthews Correlation Coefficient = 0.68, and area under the receiver operator characteristics = 0.68, respectively, had the higher performance than the other algorithms. CONCLUSION: The results of evaluating the performance criteria showed that the J-48 can be considered as a suitable computational prediction model for diagnosing COVID-19 disease.