Cargando…

Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data

BACKGROUND: Injuries caused by RTA are classified under the International Classification of Diseases-10 as ‘S00-T99’ and represent imbalanced samples with a mortality rate of only 1.2% among all RTA victims. To predict the characteristics of external causes of road traffic accident (RTA) injuries an...

Descripción completa

Detalles Bibliográficos
Autores principales: Boo, Yookyung, Choi, Youngjin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344638/
https://www.ncbi.nlm.nih.gov/pubmed/35918672
http://dx.doi.org/10.1186/s12889-022-13719-3
_version_ 1784761263101837312
author Boo, Yookyung
Choi, Youngjin
author_facet Boo, Yookyung
Choi, Youngjin
author_sort Boo, Yookyung
collection PubMed
description BACKGROUND: Injuries caused by RTA are classified under the International Classification of Diseases-10 as ‘S00-T99’ and represent imbalanced samples with a mortality rate of only 1.2% among all RTA victims. To predict the characteristics of external causes of road traffic accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classification techniques for imbalanced samples. METHODS: The present study extracted and utilized data spanning over a 5-year period (2013–2017) from the Korean National Hospital Discharge In-depth Injury Survey (KNHDS), a national level survey conducted by the Korea Disease Control and Prevention Agency, A total of eight variables were used in the prediction, including patient, accident, and injury/disease characteristics. As the data was imbalanced, a sample consisting of only severe injuries was constructed and compared against the total sample. Considering the characteristics of the samples, preprocessing was performed in the study. The samples were standardized first, considering that they contained many variables with different units. Among the ensemble techniques for classification, the present study utilized Random Forest, Extra-Trees, and XGBoost. Four different over- and under-sampling techniques were used to compare the performance of algorithms using “accuracy”, “precision”, “recall”, “F1”, and “MCC”. RESULTS: The results showed that among the prediction techniques, XGBoost had the best performance. While the synthetic minority oversampling technique (SMOTE), a type of over-sampling, also demonstrated a certain level of performance, under-sampling was the most superior. Overall, prediction by the XGBoost model with samples using SMOTE produced the best results. CONCLUSION: This study presented the results of an empirical comparison of the validity of sampling techniques and classification algorithms that affect the accuracy of imbalanced samples by combining two techniques. The findings could be used as reference data in classification analyses of imbalanced data in the medical field.
format Online
Article
Text
id pubmed-9344638
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93446382022-08-03 Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data Boo, Yookyung Choi, Youngjin BMC Public Health Research BACKGROUND: Injuries caused by RTA are classified under the International Classification of Diseases-10 as ‘S00-T99’ and represent imbalanced samples with a mortality rate of only 1.2% among all RTA victims. To predict the characteristics of external causes of road traffic accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classification techniques for imbalanced samples. METHODS: The present study extracted and utilized data spanning over a 5-year period (2013–2017) from the Korean National Hospital Discharge In-depth Injury Survey (KNHDS), a national level survey conducted by the Korea Disease Control and Prevention Agency, A total of eight variables were used in the prediction, including patient, accident, and injury/disease characteristics. As the data was imbalanced, a sample consisting of only severe injuries was constructed and compared against the total sample. Considering the characteristics of the samples, preprocessing was performed in the study. The samples were standardized first, considering that they contained many variables with different units. Among the ensemble techniques for classification, the present study utilized Random Forest, Extra-Trees, and XGBoost. Four different over- and under-sampling techniques were used to compare the performance of algorithms using “accuracy”, “precision”, “recall”, “F1”, and “MCC”. RESULTS: The results showed that among the prediction techniques, XGBoost had the best performance. While the synthetic minority oversampling technique (SMOTE), a type of over-sampling, also demonstrated a certain level of performance, under-sampling was the most superior. Overall, prediction by the XGBoost model with samples using SMOTE produced the best results. CONCLUSION: This study presented the results of an empirical comparison of the validity of sampling techniques and classification algorithms that affect the accuracy of imbalanced samples by combining two techniques. The findings could be used as reference data in classification analyses of imbalanced data in the medical field. BioMed Central 2022-08-02 /pmc/articles/PMC9344638/ /pubmed/35918672 http://dx.doi.org/10.1186/s12889-022-13719-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Boo, Yookyung
Choi, Youngjin
Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
title Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
title_full Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
title_fullStr Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
title_full_unstemmed Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
title_short Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
title_sort comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344638/
https://www.ncbi.nlm.nih.gov/pubmed/35918672
http://dx.doi.org/10.1186/s12889-022-13719-3
work_keys_str_mv AT booyookyung comparisonofmortalitypredictionmodelsforroadtrafficaccidentsanensembletechniqueforimbalanceddata
AT choiyoungjin comparisonofmortalitypredictionmodelsforroadtrafficaccidentsanensembletechniqueforimbalanceddata