Cargando…

Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset samp...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ke, Xue, Qingwen, Lu, Jian John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8305749/
https://www.ncbi.nlm.nih.gov/pubmed/34299986
http://dx.doi.org/10.3390/ijerph18147534
_version_ 1783727646706761728
author Wang, Ke
Xue, Qingwen
Lu, Jian John
author_facet Wang, Ke
Xue, Qingwen
Lu, Jian John
author_sort Wang, Ke
collection PubMed
description Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.
format Online
Article
Text
id pubmed-8305749
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83057492021-07-25 Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework Wang, Ke Xue, Qingwen Lu, Jian John Int J Environ Res Public Health Article Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability. MDPI 2021-07-15 /pmc/articles/PMC8305749/ /pubmed/34299986 http://dx.doi.org/10.3390/ijerph18147534 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Ke
Xue, Qingwen
Lu, Jian John
Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
title Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
title_full Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
title_fullStr Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
title_full_unstemmed Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
title_short Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
title_sort risky driver recognition with class imbalance data and automated machine learning framework
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8305749/
https://www.ncbi.nlm.nih.gov/pubmed/34299986
http://dx.doi.org/10.3390/ijerph18147534
work_keys_str_mv AT wangke riskydriverrecognitionwithclassimbalancedataandautomatedmachinelearningframework
AT xueqingwen riskydriverrecognitionwithclassimbalancedataandautomatedmachinelearningframework
AT lujianjohn riskydriverrecognitionwithclassimbalancedataandautomatedmachinelearningframework