Cargando…
A supervised machine learning model for imputing missing boarding stops in smart card data
Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9734418/ http://dx.doi.org/10.1007/s12469-022-00309-0 |
_version_ | 1784846581778874368 |
---|---|
author | Shalit, Nadav Fire, Michael Ben-Elia, Eran |
author_facet | Shalit, Nadav Fire, Michael Ben-Elia, Eran |
author_sort | Shalit, Nadav |
collection | PubMed |
description | Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed. |
format | Online Article Text |
id | pubmed-9734418 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-97344182022-12-12 A supervised machine learning model for imputing missing boarding stops in smart card data Shalit, Nadav Fire, Michael Ben-Elia, Eran Public Transp Original Research Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed. Springer Berlin Heidelberg 2022-12-07 2023 /pmc/articles/PMC9734418/ http://dx.doi.org/10.1007/s12469-022-00309-0 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Original Research Shalit, Nadav Fire, Michael Ben-Elia, Eran A supervised machine learning model for imputing missing boarding stops in smart card data |
title | A supervised machine learning model for imputing missing boarding stops in smart card data |
title_full | A supervised machine learning model for imputing missing boarding stops in smart card data |
title_fullStr | A supervised machine learning model for imputing missing boarding stops in smart card data |
title_full_unstemmed | A supervised machine learning model for imputing missing boarding stops in smart card data |
title_short | A supervised machine learning model for imputing missing boarding stops in smart card data |
title_sort | supervised machine learning model for imputing missing boarding stops in smart card data |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9734418/ http://dx.doi.org/10.1007/s12469-022-00309-0 |
work_keys_str_mv | AT shalitnadav asupervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata AT firemichael asupervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata AT beneliaeran asupervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata AT shalitnadav supervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata AT firemichael supervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata AT beneliaeran supervisedmachinelearningmodelforimputingmissingboardingstopsinsmartcarddata |