Cargando…

Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data

BACKGROUND: A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Grethlein, David, Winston, Flaura Koplin, Walshe, Elizabeth, Tanner, Sean, Kandadai, Venk, Ontañón, Santiago
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7333075/ https://www.ncbi.nlm.nih.gov/pubmed/32554384 http://dx.doi.org/10.2196/13995

_version_	1783553672905490432
author	Grethlein, David Winston, Flaura Koplin Walshe, Elizabeth Tanner, Sean Kandadai, Venk Ontañón, Santiago
author_facet	Grethlein, David Winston, Flaura Koplin Walshe, Elizabeth Tanner, Sean Kandadai, Venk Ontañón, Santiago
author_sort	Grethlein, David
collection	PubMed
description	BACKGROUND: A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. OBJECTIVE: Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. METHODS: We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). RESULTS: The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). CONCLUSIONS: Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.
format	Online Article Text
id	pubmed-7333075
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-73330752020-07-06 Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data Grethlein, David Winston, Flaura Koplin Walshe, Elizabeth Tanner, Sean Kandadai, Venk Ontañón, Santiago J Med Internet Res Original Paper BACKGROUND: A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. OBJECTIVE: Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. METHODS: We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). RESULTS: The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). CONCLUSIONS: Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables. JMIR Publications 2020-06-18 /pmc/articles/PMC7333075/ /pubmed/32554384 http://dx.doi.org/10.2196/13995 Text en ©David Grethlein, Flaura Koplin Winston, Elizabeth Walshe, Sean Tanner, Venk Kandadai, Santiago Ontañón. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.06.2020. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Grethlein, David Winston, Flaura Koplin Walshe, Elizabeth Tanner, Sean Kandadai, Venk Ontañón, Santiago Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data
title	Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data
title_full	Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data
title_fullStr	Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data
title_full_unstemmed	Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data
title_short	Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data
title_sort	simulator pre-screening of underprepared drivers prior to licensing on-road examination: clustering of virtual driving test time series data
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7333075/ https://www.ncbi.nlm.nih.gov/pubmed/32554384 http://dx.doi.org/10.2196/13995
work_keys_str_mv	AT grethleindavid simulatorprescreeningofunderprepareddriverspriortolicensingonroadexaminationclusteringofvirtualdrivingtesttimeseriesdata AT winstonflaurakoplin simulatorprescreeningofunderprepareddriverspriortolicensingonroadexaminationclusteringofvirtualdrivingtesttimeseriesdata AT walsheelizabeth simulatorprescreeningofunderprepareddriverspriortolicensingonroadexaminationclusteringofvirtualdrivingtesttimeseriesdata AT tannersean simulatorprescreeningofunderprepareddriverspriortolicensingonroadexaminationclusteringofvirtualdrivingtesttimeseriesdata AT kandadaivenk simulatorprescreeningofunderprepareddriverspriortolicensingonroadexaminationclusteringofvirtualdrivingtesttimeseriesdata AT ontanonsantiago simulatorprescreeningofunderprepareddriverspriortolicensingonroadexaminationclusteringofvirtualdrivingtesttimeseriesdata

Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data

Ejemplares similares