Cargando…

Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study

Recent developments in cloud computing and the Internet of Things have enabled smart environments, in terms of both monitoring and actuation. Unfortunately, this often results in unsustainable cloud-based solutions, whereby, in the interest of simplicity, a wealth of raw (unprocessed) data are pushe...

Descripción completa

Detalles Bibliográficos
Autores principales: Erhan, Laura, Di Mauro, Mario, Anjum, Ashiq, Bagdasar, Ovidiu, Song, Wei, Liotta, Antonio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659818/
https://www.ncbi.nlm.nih.gov/pubmed/34883778
http://dx.doi.org/10.3390/s21237774
_version_ 1784613054978195456
author Erhan, Laura
Di Mauro, Mario
Anjum, Ashiq
Bagdasar, Ovidiu
Song, Wei
Liotta, Antonio
author_facet Erhan, Laura
Di Mauro, Mario
Anjum, Ashiq
Bagdasar, Ovidiu
Song, Wei
Liotta, Antonio
author_sort Erhan, Laura
collection PubMed
description Recent developments in cloud computing and the Internet of Things have enabled smart environments, in terms of both monitoring and actuation. Unfortunately, this often results in unsustainable cloud-based solutions, whereby, in the interest of simplicity, a wealth of raw (unprocessed) data are pushed from sensor nodes to the cloud. Herein, we advocate the use of machine learning at sensor nodes to perform essential data-cleaning operations, to avoid the transmission of corrupted (often unusable) data to the cloud. Starting from a public pollution dataset, we investigate how two machine learning techniques (kNN and missForest) may be embedded on Raspberry Pi to perform data imputation, without impacting the data collection process. Our experimental results demonstrate the accuracy and computational efficiency of edge-learning methods for filling in missing data values in corrupted data series. We find that kNN and missForest correctly impute up to 40% of randomly distributed missing values, with a density distribution of values that is indistinguishable from the benchmark. We also show a trade-off analysis for the case of bursty missing values, with recoverable blocks of up to 100 samples. Computation times are shorter than sampling periods, allowing for data imputation at the edge in a timely manner.
format Online
Article
Text
id pubmed-8659818
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86598182021-12-10 Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study Erhan, Laura Di Mauro, Mario Anjum, Ashiq Bagdasar, Ovidiu Song, Wei Liotta, Antonio Sensors (Basel) Article Recent developments in cloud computing and the Internet of Things have enabled smart environments, in terms of both monitoring and actuation. Unfortunately, this often results in unsustainable cloud-based solutions, whereby, in the interest of simplicity, a wealth of raw (unprocessed) data are pushed from sensor nodes to the cloud. Herein, we advocate the use of machine learning at sensor nodes to perform essential data-cleaning operations, to avoid the transmission of corrupted (often unusable) data to the cloud. Starting from a public pollution dataset, we investigate how two machine learning techniques (kNN and missForest) may be embedded on Raspberry Pi to perform data imputation, without impacting the data collection process. Our experimental results demonstrate the accuracy and computational efficiency of edge-learning methods for filling in missing data values in corrupted data series. We find that kNN and missForest correctly impute up to 40% of randomly distributed missing values, with a density distribution of values that is indistinguishable from the benchmark. We also show a trade-off analysis for the case of bursty missing values, with recoverable blocks of up to 100 samples. Computation times are shorter than sampling periods, allowing for data imputation at the edge in a timely manner. MDPI 2021-11-23 /pmc/articles/PMC8659818/ /pubmed/34883778 http://dx.doi.org/10.3390/s21237774 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Erhan, Laura
Di Mauro, Mario
Anjum, Ashiq
Bagdasar, Ovidiu
Song, Wei
Liotta, Antonio
Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study
title Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study
title_full Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study
title_fullStr Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study
title_full_unstemmed Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study
title_short Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study
title_sort embedded data imputation for environmental intelligent sensing: a case study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659818/
https://www.ncbi.nlm.nih.gov/pubmed/34883778
http://dx.doi.org/10.3390/s21237774
work_keys_str_mv AT erhanlaura embeddeddataimputationforenvironmentalintelligentsensingacasestudy
AT dimauromario embeddeddataimputationforenvironmentalintelligentsensingacasestudy
AT anjumashiq embeddeddataimputationforenvironmentalintelligentsensingacasestudy
AT bagdasarovidiu embeddeddataimputationforenvironmentalintelligentsensingacasestudy
AT songwei embeddeddataimputationforenvironmentalintelligentsensingacasestudy
AT liottaantonio embeddeddataimputationforenvironmentalintelligentsensingacasestudy