Cargando…

A supervised machine learning model for imputing missing boarding stops in smart card data

Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport...

Descripción completa

Detalles Bibliográficos
Autores principales: Shalit, Nadav, Fire, Michael, Ben-Elia, Eran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9734418/
http://dx.doi.org/10.1007/s12469-022-00309-0
_version_ 1784846581778874368
author Shalit, Nadav
Fire, Michael
Ben-Elia, Eran
author_facet Shalit, Nadav
Fire, Michael
Ben-Elia, Eran
author_sort Shalit, Nadav
collection PubMed
description Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed.
format Online
Article
Text
id pubmed-9734418
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-97344182022-12-12 A supervised machine learning model for imputing missing boarding stops in smart card data Shalit, Nadav Fire, Michael Ben-Elia, Eran Public Transp Original Research Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed. Springer Berlin Heidelberg 2022-12-07 2023 /pmc/articles/PMC9734418/ http://dx.doi.org/10.1007/s12469-022-00309-0 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Research
Shalit, Nadav
Fire, Michael
Ben-Elia, Eran
A supervised machine learning model for imputing missing boarding stops in smart card data
title A supervised machine learning model for imputing missing boarding stops in smart card data
title_full A supervised machine learning model for imputing missing boarding stops in smart card data
title_fullStr A supervised machine learning model for imputing missing boarding stops in smart card data
title_full_unstemmed A supervised machine learning model for imputing missing boarding stops in smart card data
title_short A supervised machine learning model for imputing missing boarding stops in smart card data
title_sort supervised machine learning model for imputing missing boarding stops in smart card data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9734418/
http://dx.doi.org/10.1007/s12469-022-00309-0
work_keys_str_mv AT shalitnadav asupervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata
AT firemichael asupervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata
AT beneliaeran asupervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata
AT shalitnadav supervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata
AT firemichael supervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata
AT beneliaeran supervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata