Cargando…

Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling

In this study, four models—logistic regression (LR), random forest (RF), linear support vector machine (SVM), and radial basis function (RBF)-SVM—were compared for their accuracy in determining mortality caused by road traffic injuries. They were tested using five years of national-level data from t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Boo, Yookyung, Choi, Youngjin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197414/ https://www.ncbi.nlm.nih.gov/pubmed/34073920 http://dx.doi.org/10.3390/ijerph18115604

_version_	1783706915000287232
author	Boo, Yookyung Choi, Youngjin
author_facet	Boo, Yookyung Choi, Youngjin
author_sort	Boo, Yookyung
collection	PubMed
description	In this study, four models—logistic regression (LR), random forest (RF), linear support vector machine (SVM), and radial basis function (RBF)-SVM—were compared for their accuracy in determining mortality caused by road traffic injuries. They were tested using five years of national-level data from the Korea Disease Control and Prevention Agency’s (KDCA) National Hospital Discharge In-Depth Survey (2013 through to 2017). Model performance was measured for accuracy, precision, recall, F1 score, and Brier score metrics using classification analysis that included characteristics of patients, accidents, injuries, and illnesses. Due to the number of variables and differing units, the rates of survival and mortality related to road traffic accidents were imbalanced, so the data was corrected and standardized before the classification models’ performances were compared. Using the importance analysis, the main diagnosis, the type of injury, the site of the injury, the type of injury, the operation status, the type of accident, the role at the time of the accident, and the sex were selected as the analysis factors. The biggest contributing factor was the role in the accident, which is the driver, and the major sites of the injuries were head injuries and deep injuries. Using selected factors, comparisons of the classification performance of each model indicated RBF-SVM and RF models were superior to the others. Of the SVM models, the RBF kernel model was superior to the linear kernel model; it can be inferred that the performance of the high-dimensional transformed RBF model is superior when the dimension is complex because of the use of multiple variables. The findings suggest there are limitations to analyses involving imbalanced, multidimensional original data, such as data on road traffic mortality. Thus, analyses must be performed after imbalances are corrected.
format	Online Article Text
id	pubmed-8197414
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-81974142021-06-13 Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling Boo, Yookyung Choi, Youngjin Int J Environ Res Public Health Article In this study, four models—logistic regression (LR), random forest (RF), linear support vector machine (SVM), and radial basis function (RBF)-SVM—were compared for their accuracy in determining mortality caused by road traffic injuries. They were tested using five years of national-level data from the Korea Disease Control and Prevention Agency’s (KDCA) National Hospital Discharge In-Depth Survey (2013 through to 2017). Model performance was measured for accuracy, precision, recall, F1 score, and Brier score metrics using classification analysis that included characteristics of patients, accidents, injuries, and illnesses. Due to the number of variables and differing units, the rates of survival and mortality related to road traffic accidents were imbalanced, so the data was corrected and standardized before the classification models’ performances were compared. Using the importance analysis, the main diagnosis, the type of injury, the site of the injury, the type of injury, the operation status, the type of accident, the role at the time of the accident, and the sex were selected as the analysis factors. The biggest contributing factor was the role in the accident, which is the driver, and the major sites of the injuries were head injuries and deep injuries. Using selected factors, comparisons of the classification performance of each model indicated RBF-SVM and RF models were superior to the others. Of the SVM models, the RBF kernel model was superior to the linear kernel model; it can be inferred that the performance of the high-dimensional transformed RBF model is superior when the dimension is complex because of the use of multiple variables. The findings suggest there are limitations to analyses involving imbalanced, multidimensional original data, such as data on road traffic mortality. Thus, analyses must be performed after imbalances are corrected. MDPI 2021-05-24 /pmc/articles/PMC8197414/ /pubmed/34073920 http://dx.doi.org/10.3390/ijerph18115604 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Boo, Yookyung Choi, Youngjin Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling
title	Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling
title_full	Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling
title_fullStr	Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling
title_full_unstemmed	Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling
title_short	Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling
title_sort	comparison of prediction models for mortality related to injuries from road traffic accidents after correcting for undersampling
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197414/ https://www.ncbi.nlm.nih.gov/pubmed/34073920 http://dx.doi.org/10.3390/ijerph18115604
work_keys_str_mv	AT booyookyung comparisonofpredictionmodelsformortalityrelatedtoinjuriesfromroadtrafficaccidentsaftercorrectingforundersampling AT choiyoungjin comparisonofpredictionmodelsformortalityrelatedtoinjuriesfromroadtrafficaccidentsaftercorrectingforundersampling

Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling

Ejemplares similares