Cargando…

An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data qua...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Junsheng, Mao, Baohua, Bai, Yun, Zhang, Tong, Miao, Changjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7181140/
https://www.ncbi.nlm.nih.gov/pubmed/32252432
http://dx.doi.org/10.3390/s20071992
_version_ 1783525980376137728
author Huang, Junsheng
Mao, Baohua
Bai, Yun
Zhang, Tong
Miao, Changjun
author_facet Huang, Junsheng
Mao, Baohua
Bai, Yun
Zhang, Tong
Miao, Changjun
author_sort Huang, Junsheng
collection PubMed
description Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
format Online
Article
Text
id pubmed-7181140
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-71811402020-04-28 An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data Huang, Junsheng Mao, Baohua Bai, Yun Zhang, Tong Miao, Changjun Sensors (Basel) Article Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management. MDPI 2020-04-02 /pmc/articles/PMC7181140/ /pubmed/32252432 http://dx.doi.org/10.3390/s20071992 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Huang, Junsheng
Mao, Baohua
Bai, Yun
Zhang, Tong
Miao, Changjun
An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_full An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_fullStr An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_full_unstemmed An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_short An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_sort integrated fuzzy c-means method for missing data imputation using taxi gps data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7181140/
https://www.ncbi.nlm.nih.gov/pubmed/32252432
http://dx.doi.org/10.3390/s20071992
work_keys_str_mv AT huangjunsheng anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT maobaohua anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT baiyun anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT zhangtong anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT miaochangjun anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT huangjunsheng integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT maobaohua integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT baiyun integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT zhangtong integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT miaochangjun integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata