Cargando…
ELI: an IoT-aware big data pipeline with data curation and data quality
The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors i...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557504/ https://www.ncbi.nlm.nih.gov/pubmed/37810363 http://dx.doi.org/10.7717/peerj-cs.1605 |
_version_ | 1785117104019603456 |
---|---|
author | de Haro-Olmo, Francisco José Valencia-Parra, Alvaro Varela-Vaca, Ángel Jesús Álvarez-Bermejo, José Antonio Gómez-López, María Teresa |
author_facet | de Haro-Olmo, Francisco José Valencia-Parra, Alvaro Varela-Vaca, Ángel Jesús Álvarez-Bermejo, José Antonio Gómez-López, María Teresa |
author_sort | de Haro-Olmo, Francisco José |
collection | PubMed |
description | The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors into processes, causing damage or increasing costs). This article presents ELI, an IoT-based Big Data pipeline for developing a data curation process and assessing the usability of data collected by IoT sensors in both offline and online scenarios. We propose the use of a pipeline that integrates data transformation and integration tools and a customisable decision model based on the Decision Model and Notation (DMN) to evaluate the data quality. Our study emphasises the importance of data curation and quality to integrate IoT information by identifying and discarding low-quality data that obstruct meaningful insights and introduce errors in decision making. We evaluated our approach in a smart farm scenario using agricultural humidity and temperature data collected from various types of sensors. Moreover, the proposed model exhibited consistent results in offline and online (stream data) scenarios. In addition, a performance evaluation has been developed, demonstrating its effectiveness. In summary, this article contributes to the development of a usable and effective IoT-based Big Data pipeline with data curation capabilities and assessing data usability in both online and offline scenarios. Additionally, it introduces customisable decision models for measuring data quality across multiple dimensions. |
format | Online Article Text |
id | pubmed-10557504 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105575042023-10-07 ELI: an IoT-aware big data pipeline with data curation and data quality de Haro-Olmo, Francisco José Valencia-Parra, Alvaro Varela-Vaca, Ángel Jesús Álvarez-Bermejo, José Antonio Gómez-López, María Teresa PeerJ Comput Sci Data Science The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors into processes, causing damage or increasing costs). This article presents ELI, an IoT-based Big Data pipeline for developing a data curation process and assessing the usability of data collected by IoT sensors in both offline and online scenarios. We propose the use of a pipeline that integrates data transformation and integration tools and a customisable decision model based on the Decision Model and Notation (DMN) to evaluate the data quality. Our study emphasises the importance of data curation and quality to integrate IoT information by identifying and discarding low-quality data that obstruct meaningful insights and introduce errors in decision making. We evaluated our approach in a smart farm scenario using agricultural humidity and temperature data collected from various types of sensors. Moreover, the proposed model exhibited consistent results in offline and online (stream data) scenarios. In addition, a performance evaluation has been developed, demonstrating its effectiveness. In summary, this article contributes to the development of a usable and effective IoT-based Big Data pipeline with data curation capabilities and assessing data usability in both online and offline scenarios. Additionally, it introduces customisable decision models for measuring data quality across multiple dimensions. PeerJ Inc. 2023-10-02 /pmc/articles/PMC10557504/ /pubmed/37810363 http://dx.doi.org/10.7717/peerj-cs.1605 Text en ©2023 de Haro-Olmo et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Science de Haro-Olmo, Francisco José Valencia-Parra, Alvaro Varela-Vaca, Ángel Jesús Álvarez-Bermejo, José Antonio Gómez-López, María Teresa ELI: an IoT-aware big data pipeline with data curation and data quality |
title | ELI: an IoT-aware big data pipeline with data curation and data quality |
title_full | ELI: an IoT-aware big data pipeline with data curation and data quality |
title_fullStr | ELI: an IoT-aware big data pipeline with data curation and data quality |
title_full_unstemmed | ELI: an IoT-aware big data pipeline with data curation and data quality |
title_short | ELI: an IoT-aware big data pipeline with data curation and data quality |
title_sort | eli: an iot-aware big data pipeline with data curation and data quality |
topic | Data Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557504/ https://www.ncbi.nlm.nih.gov/pubmed/37810363 http://dx.doi.org/10.7717/peerj-cs.1605 |
work_keys_str_mv | AT deharoolmofranciscojose elianiotawarebigdatapipelinewithdatacurationanddataquality AT valenciaparraalvaro elianiotawarebigdatapipelinewithdatacurationanddataquality AT varelavacaangeljesus elianiotawarebigdatapipelinewithdatacurationanddataquality AT alvarezbermejojoseantonio elianiotawarebigdatapipelinewithdatacurationanddataquality AT gomezlopezmariateresa elianiotawarebigdatapipelinewithdatacurationanddataquality |