Cargando…

ELI: an IoT-aware big data pipeline with data curation and data quality

The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors i...

Descripción completa

Detalles Bibliográficos
Autores principales: de Haro-Olmo, Francisco José, Valencia-Parra, Alvaro, Varela-Vaca, Ángel Jesús, Álvarez-Bermejo, José Antonio, Gómez-López, María Teresa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557504/
https://www.ncbi.nlm.nih.gov/pubmed/37810363
http://dx.doi.org/10.7717/peerj-cs.1605
_version_ 1785117104019603456
author de Haro-Olmo, Francisco José
Valencia-Parra, Alvaro
Varela-Vaca, Ángel Jesús
Álvarez-Bermejo, José Antonio
Gómez-López, María Teresa
author_facet de Haro-Olmo, Francisco José
Valencia-Parra, Alvaro
Varela-Vaca, Ángel Jesús
Álvarez-Bermejo, José Antonio
Gómez-López, María Teresa
author_sort de Haro-Olmo, Francisco José
collection PubMed
description The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors into processes, causing damage or increasing costs). This article presents ELI, an IoT-based Big Data pipeline for developing a data curation process and assessing the usability of data collected by IoT sensors in both offline and online scenarios. We propose the use of a pipeline that integrates data transformation and integration tools and a customisable decision model based on the Decision Model and Notation (DMN) to evaluate the data quality. Our study emphasises the importance of data curation and quality to integrate IoT information by identifying and discarding low-quality data that obstruct meaningful insights and introduce errors in decision making. We evaluated our approach in a smart farm scenario using agricultural humidity and temperature data collected from various types of sensors. Moreover, the proposed model exhibited consistent results in offline and online (stream data) scenarios. In addition, a performance evaluation has been developed, demonstrating its effectiveness. In summary, this article contributes to the development of a usable and effective IoT-based Big Data pipeline with data curation capabilities and assessing data usability in both online and offline scenarios. Additionally, it introduces customisable decision models for measuring data quality across multiple dimensions.
format Online
Article
Text
id pubmed-10557504
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-105575042023-10-07 ELI: an IoT-aware big data pipeline with data curation and data quality de Haro-Olmo, Francisco José Valencia-Parra, Alvaro Varela-Vaca, Ángel Jesús Álvarez-Bermejo, José Antonio Gómez-López, María Teresa PeerJ Comput Sci Data Science The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors into processes, causing damage or increasing costs). This article presents ELI, an IoT-based Big Data pipeline for developing a data curation process and assessing the usability of data collected by IoT sensors in both offline and online scenarios. We propose the use of a pipeline that integrates data transformation and integration tools and a customisable decision model based on the Decision Model and Notation (DMN) to evaluate the data quality. Our study emphasises the importance of data curation and quality to integrate IoT information by identifying and discarding low-quality data that obstruct meaningful insights and introduce errors in decision making. We evaluated our approach in a smart farm scenario using agricultural humidity and temperature data collected from various types of sensors. Moreover, the proposed model exhibited consistent results in offline and online (stream data) scenarios. In addition, a performance evaluation has been developed, demonstrating its effectiveness. In summary, this article contributes to the development of a usable and effective IoT-based Big Data pipeline with data curation capabilities and assessing data usability in both online and offline scenarios. Additionally, it introduces customisable decision models for measuring data quality across multiple dimensions. PeerJ Inc. 2023-10-02 /pmc/articles/PMC10557504/ /pubmed/37810363 http://dx.doi.org/10.7717/peerj-cs.1605 Text en ©2023 de Haro-Olmo et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Science
de Haro-Olmo, Francisco José
Valencia-Parra, Alvaro
Varela-Vaca, Ángel Jesús
Álvarez-Bermejo, José Antonio
Gómez-López, María Teresa
ELI: an IoT-aware big data pipeline with data curation and data quality
title ELI: an IoT-aware big data pipeline with data curation and data quality
title_full ELI: an IoT-aware big data pipeline with data curation and data quality
title_fullStr ELI: an IoT-aware big data pipeline with data curation and data quality
title_full_unstemmed ELI: an IoT-aware big data pipeline with data curation and data quality
title_short ELI: an IoT-aware big data pipeline with data curation and data quality
title_sort eli: an iot-aware big data pipeline with data curation and data quality
topic Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557504/
https://www.ncbi.nlm.nih.gov/pubmed/37810363
http://dx.doi.org/10.7717/peerj-cs.1605
work_keys_str_mv AT deharoolmofranciscojose elianiotawarebigdatapipelinewithdatacurationanddataquality
AT valenciaparraalvaro elianiotawarebigdatapipelinewithdatacurationanddataquality
AT varelavacaangeljesus elianiotawarebigdatapipelinewithdatacurationanddataquality
AT alvarezbermejojoseantonio elianiotawarebigdatapipelinewithdatacurationanddataquality
AT gomezlopezmariateresa elianiotawarebigdatapipelinewithdatacurationanddataquality