Cargando…
Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework
Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset samp...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8305749/ https://www.ncbi.nlm.nih.gov/pubmed/34299986 http://dx.doi.org/10.3390/ijerph18147534 |
_version_ | 1783727646706761728 |
---|---|
author | Wang, Ke Xue, Qingwen Lu, Jian John |
author_facet | Wang, Ke Xue, Qingwen Lu, Jian John |
author_sort | Wang, Ke |
collection | PubMed |
description | Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability. |
format | Online Article Text |
id | pubmed-8305749 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-83057492021-07-25 Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework Wang, Ke Xue, Qingwen Lu, Jian John Int J Environ Res Public Health Article Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability. MDPI 2021-07-15 /pmc/articles/PMC8305749/ /pubmed/34299986 http://dx.doi.org/10.3390/ijerph18147534 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Wang, Ke Xue, Qingwen Lu, Jian John Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework |
title | Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework |
title_full | Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework |
title_fullStr | Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework |
title_full_unstemmed | Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework |
title_short | Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework |
title_sort | risky driver recognition with class imbalance data and automated machine learning framework |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8305749/ https://www.ncbi.nlm.nih.gov/pubmed/34299986 http://dx.doi.org/10.3390/ijerph18147534 |
work_keys_str_mv | AT wangke riskydriverrecognitionwithclassimbalancedataandautomatedmachinelearningframework AT xueqingwen riskydriverrecognitionwithclassimbalancedataandautomatedmachinelearningframework AT lujianjohn riskydriverrecognitionwithclassimbalancedataandautomatedmachinelearningframework |