Cargando…

Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods

Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Karamanou, Areti, Brimos, Petros, Kalampokis, Evangelos, Tarabanis, Konstantinos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9781156/ https://www.ncbi.nlm.nih.gov/pubmed/36560054 http://dx.doi.org/10.3390/s22249684

_version_	1784857004857098240
author	Karamanou, Areti Brimos, Petros Kalampokis, Evangelos Tarabanis, Konstantinos
author_facet	Karamanou, Areti Brimos, Petros Kalampokis, Evangelos Tarabanis, Konstantinos
author_sort	Karamanou, Areti
collection	PubMed
description	Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD.
format	Online Article Text
id	pubmed-9781156
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-97811562022-12-24 Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods Karamanou, Areti Brimos, Petros Kalampokis, Evangelos Tarabanis, Konstantinos Sensors (Basel) Article Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. MDPI 2022-12-10 /pmc/articles/PMC9781156/ /pubmed/36560054 http://dx.doi.org/10.3390/s22249684 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Karamanou, Areti Brimos, Petros Kalampokis, Evangelos Tarabanis, Konstantinos Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
title	Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
title_full	Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
title_fullStr	Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
title_full_unstemmed	Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
title_short	Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
title_sort	exploring the quality of dynamic open government data using statistical and machine learning methods
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9781156/ https://www.ncbi.nlm.nih.gov/pubmed/36560054 http://dx.doi.org/10.3390/s22249684
work_keys_str_mv	AT karamanouareti exploringthequalityofdynamicopengovernmentdatausingstatisticalandmachinelearningmethods AT brimospetros exploringthequalityofdynamicopengovernmentdatausingstatisticalandmachinelearningmethods AT kalampokisevangelos exploringthequalityofdynamicopengovernmentdatausingstatisticalandmachinelearningmethods AT tarabaniskonstantinos exploringthequalityofdynamicopengovernmentdatausingstatisticalandmachinelearningmethods

Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods

Ejemplares similares