Cargando…

Performance Improvements of EventIndex Distributed System at CERN

The work presented in this thesis is framed in the context of the EventIndex project of the ATLAS experiment, a big particle detector of the LHC (Large Hadron Collider) at CERN. The objective of the project is to catalog all the particle collisions, or events, recorded at the ATLAS detector and also...

Descripción completa

Detalles Bibliográficos
Autor principal: Fernandez Casani, Alvaro
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2852032
_version_ 1780977135128674304
author Fernandez Casani, Alvaro
author_facet Fernandez Casani, Alvaro
author_sort Fernandez Casani, Alvaro
collection CERN
description The work presented in this thesis is framed in the context of the EventIndex project of the ATLAS experiment, a big particle detector of the LHC (Large Hadron Collider) at CERN. The objective of the project is to catalog all the particle collisions, or events, recorded at the ATLAS detector and also simulated over the duration of the experiment. With this catalog, data can be characterized at event granularity, important for searching and locating events by the end users. Other automatic checkings can be done in the data reprocessing chain, in order assure its correcteness and optimize future processings. Due to the rise in the production rates and total volume of the data expected for Run 3 (2022-2025) and the HL-LHC (end of the 2020 decade), a scalable system is required also to simplify previous implementations. In this thesis we present the contributions to the project in the areas of distributed data collection, storage of massive volume of data and access to them. A small quantity of information (metadata) by event is indexed at CERN (Tier-0), and distributedly worldwide in the grid in all the computing centers part of the ATLAS Experiment (10 Tier-1, and around 70 Tier-2). We present a new pull model for data collection in the grid with and object store as temporary store, from where the data can be dinamically retrieved to be ingested at the final backend. We also present the contributions a new as a big data store using HBasae/Phoenix, able to sustain the required data rates and total volume of data, and that simplifies the limitations of the previous hybrid solutions. Finally, we present a computing framework and tools using Spark for the data access, and solving the anaylitic use cases workloads that access large amount of data, as the overlaps calculation, or duplicate events detection.
id cern-2852032
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28520322023-03-16T19:22:51Zhttp://cds.cern.ch/record/2852032engFernandez Casani, AlvaroPerformance Improvements of EventIndex Distributed System at CERNComputing and ComputersDetectors and Experimental TechniquesThe work presented in this thesis is framed in the context of the EventIndex project of the ATLAS experiment, a big particle detector of the LHC (Large Hadron Collider) at CERN. The objective of the project is to catalog all the particle collisions, or events, recorded at the ATLAS detector and also simulated over the duration of the experiment. With this catalog, data can be characterized at event granularity, important for searching and locating events by the end users. Other automatic checkings can be done in the data reprocessing chain, in order assure its correcteness and optimize future processings. Due to the rise in the production rates and total volume of the data expected for Run 3 (2022-2025) and the HL-LHC (end of the 2020 decade), a scalable system is required also to simplify previous implementations. In this thesis we present the contributions to the project in the areas of distributed data collection, storage of massive volume of data and access to them. A small quantity of information (metadata) by event is indexed at CERN (Tier-0), and distributedly worldwide in the grid in all the computing centers part of the ATLAS Experiment (10 Tier-1, and around 70 Tier-2). We present a new pull model for data collection in the grid with and object store as temporary store, from where the data can be dinamically retrieved to be ingested at the final backend. We also present the contributions a new as a big data store using HBasae/Phoenix, able to sustain the required data rates and total volume of data, and that simplifies the limitations of the previous hybrid solutions. Finally, we present a computing framework and tools using Spark for the data access, and solving the anaylitic use cases workloads that access large amount of data, as the overlaps calculation, or duplicate events detection.CERN-THESIS-2023-016oai:cds.cern.ch:28520322023-03-09T14:29:42Z
spellingShingle Computing and Computers
Detectors and Experimental Techniques
Fernandez Casani, Alvaro
Performance Improvements of EventIndex Distributed System at CERN
title Performance Improvements of EventIndex Distributed System at CERN
title_full Performance Improvements of EventIndex Distributed System at CERN
title_fullStr Performance Improvements of EventIndex Distributed System at CERN
title_full_unstemmed Performance Improvements of EventIndex Distributed System at CERN
title_short Performance Improvements of EventIndex Distributed System at CERN
title_sort performance improvements of eventindex distributed system at cern
topic Computing and Computers
Detectors and Experimental Techniques
url http://cds.cern.ch/record/2852032
work_keys_str_mv AT fernandezcasanialvaro performanceimprovementsofeventindexdistributedsystematcern