Cargando…

Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran

BACKGROUND: Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient’s mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective f...

Descripción completa

Detalles Bibliográficos
Autores principales: Moslehi, Samad, Rabiei, Niloofar, Soltanian, Ali Reza, Mamani, Mojgan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9308952/
https://www.ncbi.nlm.nih.gov/pubmed/35871639
http://dx.doi.org/10.1186/s12911-022-01939-x
_version_ 1784753053305405440
author Moslehi, Samad
Rabiei, Niloofar
Soltanian, Ali Reza
Mamani, Mojgan
author_facet Moslehi, Samad
Rabiei, Niloofar
Soltanian, Ali Reza
Mamani, Mojgan
author_sort Moslehi, Samad
collection PubMed
description BACKGROUND: Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient’s mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective features in COVID-19 mortality and the classification was performed using logistic model tree (LMT), classification and regression tree (CART), C4.5, and C5.0 tree based on important features. METHODS: In this retrospective study, the data of 2470 COVID-19 patients admitted to hospitals in Hamadan, west Iran, were used, of which 75.02% recovered and 24.98% died. To classify, at first among the 25 demographic, clinical, and laboratory findings, features with a relative importance more than 6% were selected by random forest. Then LMT, C4.5, C5.0, and CART trees were developed and the accuracy of classification performance was evaluated with recall, accuracy, and F1-score criteria for training, test, and total datasets. At last, the best tree was developed and the receiver operating characteristic curve and area under the curve (AUC) value were reported. RESULTS: The results of this study showed that among demographic and clinical features gender and age, and among laboratory findings blood urea nitrogen, partial thromboplastin time, serum glutamic-oxaloacetic transaminase, and erythrocyte sedimentation rate had more than 6% relative importance. Developing the trees using the above features revealed that the CART with the values of F1-score, Accuracy, and Recall, 0.8681, 0.7824, and 0.955, respectively, for the test dataset and 0.8667, 0.7834, and 0.9385, respectively, for the total dataset had the best performance. The AUC value obtained for the CART was 79.5%. CONCLUSIONS: Finding a highly accurate and qualified model for interpreting the classification of a response that is considered clinically consequential is critical at all stages, including treatment and immediate decision making. In this study, the CART with its high accuracy for diagnosing and classifying mortality of COVID-19 patients as well as prioritizing important demographic, clinical, and laboratory findings in an interpretable format, risk factors for prognosis of COVID-19 patients mortality identify and enable immediate and appropriate decisions for health professionals and physicians.
format Online
Article
Text
id pubmed-9308952
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93089522022-07-25 Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran Moslehi, Samad Rabiei, Niloofar Soltanian, Ali Reza Mamani, Mojgan BMC Med Inform Decis Mak Research BACKGROUND: Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient’s mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective features in COVID-19 mortality and the classification was performed using logistic model tree (LMT), classification and regression tree (CART), C4.5, and C5.0 tree based on important features. METHODS: In this retrospective study, the data of 2470 COVID-19 patients admitted to hospitals in Hamadan, west Iran, were used, of which 75.02% recovered and 24.98% died. To classify, at first among the 25 demographic, clinical, and laboratory findings, features with a relative importance more than 6% were selected by random forest. Then LMT, C4.5, C5.0, and CART trees were developed and the accuracy of classification performance was evaluated with recall, accuracy, and F1-score criteria for training, test, and total datasets. At last, the best tree was developed and the receiver operating characteristic curve and area under the curve (AUC) value were reported. RESULTS: The results of this study showed that among demographic and clinical features gender and age, and among laboratory findings blood urea nitrogen, partial thromboplastin time, serum glutamic-oxaloacetic transaminase, and erythrocyte sedimentation rate had more than 6% relative importance. Developing the trees using the above features revealed that the CART with the values of F1-score, Accuracy, and Recall, 0.8681, 0.7824, and 0.955, respectively, for the test dataset and 0.8667, 0.7834, and 0.9385, respectively, for the total dataset had the best performance. The AUC value obtained for the CART was 79.5%. CONCLUSIONS: Finding a highly accurate and qualified model for interpreting the classification of a response that is considered clinically consequential is critical at all stages, including treatment and immediate decision making. In this study, the CART with its high accuracy for diagnosing and classifying mortality of COVID-19 patients as well as prioritizing important demographic, clinical, and laboratory findings in an interpretable format, risk factors for prognosis of COVID-19 patients mortality identify and enable immediate and appropriate decisions for health professionals and physicians. BioMed Central 2022-07-24 /pmc/articles/PMC9308952/ /pubmed/35871639 http://dx.doi.org/10.1186/s12911-022-01939-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Moslehi, Samad
Rabiei, Niloofar
Soltanian, Ali Reza
Mamani, Mojgan
Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
title Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
title_full Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
title_fullStr Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
title_full_unstemmed Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
title_short Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
title_sort application of machine learning models based on decision trees in classifying the factors affecting mortality of covid-19 patients in hamadan, iran
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9308952/
https://www.ncbi.nlm.nih.gov/pubmed/35871639
http://dx.doi.org/10.1186/s12911-022-01939-x
work_keys_str_mv AT moslehisamad applicationofmachinelearningmodelsbasedondecisiontreesinclassifyingthefactorsaffectingmortalityofcovid19patientsinhamadaniran
AT rabieiniloofar applicationofmachinelearningmodelsbasedondecisiontreesinclassifyingthefactorsaffectingmortalityofcovid19patientsinhamadaniran
AT soltanianalireza applicationofmachinelearningmodelsbasedondecisiontreesinclassifyingthefactorsaffectingmortalityofcovid19patientsinhamadaniran
AT mamanimojgan applicationofmachinelearningmodelsbasedondecisiontreesinclassifyingthefactorsaffectingmortalityofcovid19patientsinhamadaniran