Cargando…

Data streams processing in metadata integration system for HENP experiments

Nowadays, heterogeneous metadata integration has become a widespread objective. Whenever it is addressed, there are numerous tasks to be solved, such as data sources analysis and storage schema development. No less important one is the development of automated, configurable and highly manageable ETL...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaida, Anastasiia, Golosova, Marina, Grigoryeva, Maria, Aulov, Vasilii
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2690991
Descripción
Sumario:Nowadays, heterogeneous metadata integration has become a widespread objective. Whenever it is addressed, there are numerous tasks to be solved, such as data sources analysis and storage schema development. No less important one is the development of automated, configurable and highly manageable ETL (data Extraction, Transformation, and Load) processes, as well as the creation of tools for their automatization, scheduling, management, monitoring. This work describes the Metadata Integration and Topology Management System, initially designed as a subsystem of the Data Knowledge Base (DKB) developed for the ATLAS experiment. The core idea of the subsystem is to separate the common features of the majority of ETL-processes from the implementation of particular tasks. It is implemented as standalone modules: supervisor and workers; a supervisor is responsible for data streams building through workers that implement a set of specific operations for a particular process. The system is intended to considerably facilitate the organizing of ongoing data integration operations with automated data stream processing.