Cargando…

Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation

Data scientists spend much time with data cleaning tasks, and this is especially important when dealing with data gathered from sensors, as finding failures is not unusual (there is an abundance of research on anomaly detection in sensor data). This work analyzes several aspects of the data generate...

Descripción completa

Detalles Bibliográficos
Autores principales: Nieto, Francisco Javier, Aguilera, Unai, López-de-Ipiña, Diego
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8472945/
https://www.ncbi.nlm.nih.gov/pubmed/34577271
http://dx.doi.org/10.3390/s21186063
_version_ 1784574864604004352
author Nieto, Francisco Javier
Aguilera, Unai
López-de-Ipiña, Diego
author_facet Nieto, Francisco Javier
Aguilera, Unai
López-de-Ipiña, Diego
author_sort Nieto, Francisco Javier
collection PubMed
description Data scientists spend much time with data cleaning tasks, and this is especially important when dealing with data gathered from sensors, as finding failures is not unusual (there is an abundance of research on anomaly detection in sensor data). This work analyzes several aspects of the data generated by different sensor types to understand particularities in the data, linking them with existing data mining methodologies. Using data from different sources, this work analyzes how the type of sensor used and its measurement units have an important impact in basic statistics such as variance and mean, because of the statistical distributions of the datasets. The work also analyzes the behavior of outliers, how to detect them, and how they affect the equivalence of sensors, as equivalence is used in many solutions for identifying anomalies. Based on the previous results, the article presents guidance on how to deal with data coming from sensors, in order to understand the characteristics of sensor datasets, and proposes a parallelized implementation. Finally, the article shows that the proposed decision-making processes work well with a new type of sensor and that parallelizing with several cores enables calculations to be executed up to four times faster.
format Online
Article
Text
id pubmed-8472945
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84729452021-09-28 Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation Nieto, Francisco Javier Aguilera, Unai López-de-Ipiña, Diego Sensors (Basel) Article Data scientists spend much time with data cleaning tasks, and this is especially important when dealing with data gathered from sensors, as finding failures is not unusual (there is an abundance of research on anomaly detection in sensor data). This work analyzes several aspects of the data generated by different sensor types to understand particularities in the data, linking them with existing data mining methodologies. Using data from different sources, this work analyzes how the type of sensor used and its measurement units have an important impact in basic statistics such as variance and mean, because of the statistical distributions of the datasets. The work also analyzes the behavior of outliers, how to detect them, and how they affect the equivalence of sensors, as equivalence is used in many solutions for identifying anomalies. Based on the previous results, the article presents guidance on how to deal with data coming from sensors, in order to understand the characteristics of sensor datasets, and proposes a parallelized implementation. Finally, the article shows that the proposed decision-making processes work well with a new type of sensor and that parallelizing with several cores enables calculations to be executed up to four times faster. MDPI 2021-09-10 /pmc/articles/PMC8472945/ /pubmed/34577271 http://dx.doi.org/10.3390/s21186063 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nieto, Francisco Javier
Aguilera, Unai
López-de-Ipiña, Diego
Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation
title Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation
title_full Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation
title_fullStr Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation
title_full_unstemmed Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation
title_short Analyzing Particularities of Sensor Datasets for Supporting Data Understanding and Preparation
title_sort analyzing particularities of sensor datasets for supporting data understanding and preparation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8472945/
https://www.ncbi.nlm.nih.gov/pubmed/34577271
http://dx.doi.org/10.3390/s21186063
work_keys_str_mv AT nietofranciscojavier analyzingparticularitiesofsensordatasetsforsupportingdataunderstandingandpreparation
AT aguileraunai analyzingparticularitiesofsensordatasetsforsupportingdataunderstandingandpreparation
AT lopezdeipinadiego analyzingparticularitiesofsensordatasetsforsupportingdataunderstandingandpreparation