Cargando…

A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities

Traffic accidents are of worldwide concern, as they are one of the leading causes of death globally. One policy designed to cope with them is the design and deployment of road safety systems. These aim to predict crashes based on historical records, provided by new Internet of Things (IoT) technolog...

Descripción completa

Detalles Bibliográficos
Autores principales:	Angarita-Zapata, Juan S., Maestre-Gongora, Gina, Calderín, Jenny Fajardo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708527/ https://www.ncbi.nlm.nih.gov/pubmed/34960494 http://dx.doi.org/10.3390/s21248401

_version_	1784622707498811392
author	Angarita-Zapata, Juan S. Maestre-Gongora, Gina Calderín, Jenny Fajardo
author_facet	Angarita-Zapata, Juan S. Maestre-Gongora, Gina Calderín, Jenny Fajardo
author_sort	Angarita-Zapata, Juan S.
collection	PubMed
description	Traffic accidents are of worldwide concern, as they are one of the leading causes of death globally. One policy designed to cope with them is the design and deployment of road safety systems. These aim to predict crashes based on historical records, provided by new Internet of Things (IoT) technologies, to enhance traffic flow management and promote safer roads. Increasing data availability has helped machine learning (ML) to address the prediction of crashes and their severity. The literature reports numerous contributions regarding survey papers, experimental comparisons of various techniques, and the design of new methods at the point where crash severity prediction (CSP) and ML converge. Despite such progress, and as far as we know, there are no comprehensive research articles that theoretically and practically approach the model selection problem (MSP) in CSP. Thus, this paper introduces a bibliometric analysis and experimental benchmark of ML and automated machine learning (AutoML) as a suitable approach to automatically address the MSP in CSP. Firstly, 2318 bibliographic references were consulted to identify relevant authors, trending topics, keywords evolution, and the most common ML methods used in related-case studies, which revealed an opportunity for the use AutoML in the transportation field. Then, we compared AutoML (AutoGluon, Auto-sklearn, TPOT) and ML (CatBoost, Decision Tree, Extra Trees, Gradient Boosting, Gaussian Naive Bayes, Light Gradient Boosting Machine, Random Forest) methods in three case studies using open data portals belonging to the cities of Medellín, Bogotá, and Bucaramanga in Colombia. Our experimentation reveals that AutoGluon and CatBoost are competitive and robust ML approaches to deal with various CSP problems. In addition, we concluded that general-purpose AutoML effectively supports the MSP in CSP without developing domain-focused AutoML methods for this supervised learning problem. Finally, based on the results obtained, we introduce challenges and research opportunities that the community should explore to enhance the contributions that ML and AutoML can bring to CSP and other transportation areas.
format	Online Article Text
id	pubmed-8708527
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87085272021-12-25 A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities Angarita-Zapata, Juan S. Maestre-Gongora, Gina Calderín, Jenny Fajardo Sensors (Basel) Article Traffic accidents are of worldwide concern, as they are one of the leading causes of death globally. One policy designed to cope with them is the design and deployment of road safety systems. These aim to predict crashes based on historical records, provided by new Internet of Things (IoT) technologies, to enhance traffic flow management and promote safer roads. Increasing data availability has helped machine learning (ML) to address the prediction of crashes and their severity. The literature reports numerous contributions regarding survey papers, experimental comparisons of various techniques, and the design of new methods at the point where crash severity prediction (CSP) and ML converge. Despite such progress, and as far as we know, there are no comprehensive research articles that theoretically and practically approach the model selection problem (MSP) in CSP. Thus, this paper introduces a bibliometric analysis and experimental benchmark of ML and automated machine learning (AutoML) as a suitable approach to automatically address the MSP in CSP. Firstly, 2318 bibliographic references were consulted to identify relevant authors, trending topics, keywords evolution, and the most common ML methods used in related-case studies, which revealed an opportunity for the use AutoML in the transportation field. Then, we compared AutoML (AutoGluon, Auto-sklearn, TPOT) and ML (CatBoost, Decision Tree, Extra Trees, Gradient Boosting, Gaussian Naive Bayes, Light Gradient Boosting Machine, Random Forest) methods in three case studies using open data portals belonging to the cities of Medellín, Bogotá, and Bucaramanga in Colombia. Our experimentation reveals that AutoGluon and CatBoost are competitive and robust ML approaches to deal with various CSP problems. In addition, we concluded that general-purpose AutoML effectively supports the MSP in CSP without developing domain-focused AutoML methods for this supervised learning problem. Finally, based on the results obtained, we introduce challenges and research opportunities that the community should explore to enhance the contributions that ML and AutoML can bring to CSP and other transportation areas. MDPI 2021-12-16 /pmc/articles/PMC8708527/ /pubmed/34960494 http://dx.doi.org/10.3390/s21248401 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Angarita-Zapata, Juan S. Maestre-Gongora, Gina Calderín, Jenny Fajardo A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities
title	A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities
title_full	A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities
title_fullStr	A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities
title_full_unstemmed	A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities
title_short	A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities
title_sort	bibliometric analysis and benchmark of machine learning and automl in crash severity prediction: the case study of three colombian cities
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708527/ https://www.ncbi.nlm.nih.gov/pubmed/34960494 http://dx.doi.org/10.3390/s21248401
work_keys_str_mv	AT angaritazapatajuans abibliometricanalysisandbenchmarkofmachinelearningandautomlincrashseveritypredictionthecasestudyofthreecolombiancities AT maestregongoragina abibliometricanalysisandbenchmarkofmachinelearningandautomlincrashseveritypredictionthecasestudyofthreecolombiancities AT calderinjennyfajardo abibliometricanalysisandbenchmarkofmachinelearningandautomlincrashseveritypredictionthecasestudyofthreecolombiancities AT angaritazapatajuans bibliometricanalysisandbenchmarkofmachinelearningandautomlincrashseveritypredictionthecasestudyofthreecolombiancities AT maestregongoragina bibliometricanalysisandbenchmarkofmachinelearningandautomlincrashseveritypredictionthecasestudyofthreecolombiancities AT calderinjennyfajardo bibliometricanalysisandbenchmarkofmachinelearningandautomlincrashseveritypredictionthecasestudyofthreecolombiancities

A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities

Ejemplares similares