Cargando…

Operationalizing and automating Data Governance

The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens o...

Descripción completa

Detalles Bibliográficos
Autores principales: Nadal, Sergi, Jovanovic, Petar, Bilalli, Besim, Romero, Oscar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736715/
https://www.ncbi.nlm.nih.gov/pubmed/36532842
http://dx.doi.org/10.1186/s40537-022-00673-5
_version_ 1784847101693263872
author Nadal, Sergi
Jovanovic, Petar
Bilalli, Besim
Romero, Oscar
author_facet Nadal, Sergi
Jovanovic, Petar
Bilalli, Besim
Romero, Oscar
author_sort Nadal, Sergi
collection PubMed
description The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance.
format Online
Article
Text
id pubmed-9736715
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-97367152022-12-12 Operationalizing and automating Data Governance Nadal, Sergi Jovanovic, Petar Bilalli, Besim Romero, Oscar J Big Data Research The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance. Springer International Publishing 2022-12-10 2022 /pmc/articles/PMC9736715/ /pubmed/36532842 http://dx.doi.org/10.1186/s40537-022-00673-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research
Nadal, Sergi
Jovanovic, Petar
Bilalli, Besim
Romero, Oscar
Operationalizing and automating Data Governance
title Operationalizing and automating Data Governance
title_full Operationalizing and automating Data Governance
title_fullStr Operationalizing and automating Data Governance
title_full_unstemmed Operationalizing and automating Data Governance
title_short Operationalizing and automating Data Governance
title_sort operationalizing and automating data governance
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736715/
https://www.ncbi.nlm.nih.gov/pubmed/36532842
http://dx.doi.org/10.1186/s40537-022-00673-5
work_keys_str_mv AT nadalsergi operationalizingandautomatingdatagovernance
AT jovanovicpetar operationalizingandautomatingdatagovernance
AT bilallibesim operationalizingandautomatingdatagovernance
AT romerooscar operationalizingandautomatingdatagovernance