Cargando…

Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment

The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this pap...

Descripción completa

Detalles Bibliográficos
Autores principales: Avati, Valentina, Blaszkiewicz, Milosz, Bocchi, Enrico, Canali, Luca, Castro, Diogo, Cervantes, Javier, Grzanka, Leszek, Guiraud, Enrico, Kaspar, Jan, Kothuri, Prasanth, Lamanna, Massimo, Malawski, Maciej, Mnich, Aleksandra, Moscicki, Jakub, Murali, Shravan, Piparo, Danilo, Tejedor, Enric
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1109/UCC-Companion.2018.00018
http://cds.cern.ch/record/2706003
Descripción
Sumario:The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores.