Cargando…

A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments

Nowadays, organizations are very interested to gather data for strategic decision-making. Data are disposable in operational sources, which are distributed, heterogeneous, and autonomous. These data are gathered through ETL processes, which occur traditionally in a pre-defined time, that is, once a...

Descripción completa

Detalles Bibliográficos
Autores principales: de Assis Vilela, Flávio, Times, Valéria Cesário, de Campos Bernardi, Alberto Carlos, de Paula Freitas, Augusto, Ciferri, Ricardo Rodrigues
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10196447/
https://www.ncbi.nlm.nih.gov/pubmed/37215774
http://dx.doi.org/10.1016/j.heliyon.2023.e15728
_version_ 1785044356087939072
author de Assis Vilela, Flávio
Times, Valéria Cesário
de Campos Bernardi, Alberto Carlos
de Paula Freitas, Augusto
Ciferri, Ricardo Rodrigues
author_facet de Assis Vilela, Flávio
Times, Valéria Cesário
de Campos Bernardi, Alberto Carlos
de Paula Freitas, Augusto
Ciferri, Ricardo Rodrigues
author_sort de Assis Vilela, Flávio
collection PubMed
description Nowadays, organizations are very interested to gather data for strategic decision-making. Data are disposable in operational sources, which are distributed, heterogeneous, and autonomous. These data are gathered through ETL processes, which occur traditionally in a pre-defined time, that is, once a day, once a week, once a month or in a specific period of time. On the other hand, there are special applications for which data needs to be obtained in a faster way and sometimes even immediately after the data are generated in the operation data sources, such as health systems and digital agriculture. Thus, the conventional ETL process and the disposable techniques are incapable of making the operational data delivered in real-time, providing low latency, high availability, and scalability. As our proposal, we present an innovative architecture, named Data Magnet, to cope with real-time ETL processes. The experimental tests performed in the digital agriculture domain using real and synthetic data showed that our proposal was able to deal in real-time with the ETL process. The Data Magnet provided great performance, showing an almost constant elapsed time for growing data volumes. Besides, Data Magnet provided significant performance gains over the traditional trigger technique.
format Online
Article
Text
id pubmed-10196447
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-101964472023-05-20 A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments de Assis Vilela, Flávio Times, Valéria Cesário de Campos Bernardi, Alberto Carlos de Paula Freitas, Augusto Ciferri, Ricardo Rodrigues Heliyon Research Article Nowadays, organizations are very interested to gather data for strategic decision-making. Data are disposable in operational sources, which are distributed, heterogeneous, and autonomous. These data are gathered through ETL processes, which occur traditionally in a pre-defined time, that is, once a day, once a week, once a month or in a specific period of time. On the other hand, there are special applications for which data needs to be obtained in a faster way and sometimes even immediately after the data are generated in the operation data sources, such as health systems and digital agriculture. Thus, the conventional ETL process and the disposable techniques are incapable of making the operational data delivered in real-time, providing low latency, high availability, and scalability. As our proposal, we present an innovative architecture, named Data Magnet, to cope with real-time ETL processes. The experimental tests performed in the digital agriculture domain using real and synthetic data showed that our proposal was able to deal in real-time with the ETL process. The Data Magnet provided great performance, showing an almost constant elapsed time for growing data volumes. Besides, Data Magnet provided significant performance gains over the traditional trigger technique. Elsevier 2023-04-26 /pmc/articles/PMC10196447/ /pubmed/37215774 http://dx.doi.org/10.1016/j.heliyon.2023.e15728 Text en © 2023 Published by Elsevier Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
de Assis Vilela, Flávio
Times, Valéria Cesário
de Campos Bernardi, Alberto Carlos
de Paula Freitas, Augusto
Ciferri, Ricardo Rodrigues
A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments
title A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments
title_full A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments
title_fullStr A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments
title_full_unstemmed A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments
title_short A non-intrusive and reactive architecture to support real-time ETL processes in data warehousing environments
title_sort non-intrusive and reactive architecture to support real-time etl processes in data warehousing environments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10196447/
https://www.ncbi.nlm.nih.gov/pubmed/37215774
http://dx.doi.org/10.1016/j.heliyon.2023.e15728
work_keys_str_mv AT deassisvilelaflavio anonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT timesvaleriacesario anonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT decamposbernardialbertocarlos anonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT depaulafreitasaugusto anonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT ciferriricardorodrigues anonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT deassisvilelaflavio nonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT timesvaleriacesario nonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT decamposbernardialbertocarlos nonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT depaulafreitasaugusto nonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments
AT ciferriricardorodrigues nonintrusiveandreactivearchitecturetosupportrealtimeetlprocessesindatawarehousingenvironments