Cargando…

Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment

The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this pap...

Descripción completa

Detalles Bibliográficos
Autores principales: Avati, Valentina, Blaszkiewicz, Milosz, Bocchi, Enrico, Canali, Luca, Castro, Diogo, Cervantes, Javier, Grzanka, Leszek, Guiraud, Enrico, Kaspar, Jan, Kothuri, Prasanth, Lamanna, Massimo, Malawski, Maciej, Mnich, Aleksandra, Moscicki, Jakub, Murali, Shravan, Piparo, Danilo, Tejedor, Enric
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1109/UCC-Companion.2018.00018
http://cds.cern.ch/record/2706003
_version_ 1780964886183936000
author Avati, Valentina
Blaszkiewicz, Milosz
Bocchi, Enrico
Canali, Luca
Castro, Diogo
Cervantes, Javier
Grzanka, Leszek
Guiraud, Enrico
Kaspar, Jan
Kothuri, Prasanth
Lamanna, Massimo
Malawski, Maciej
Mnich, Aleksandra
Moscicki, Jakub
Murali, Shravan
Piparo, Danilo
Tejedor, Enric
author_facet Avati, Valentina
Blaszkiewicz, Milosz
Bocchi, Enrico
Canali, Luca
Castro, Diogo
Cervantes, Javier
Grzanka, Leszek
Guiraud, Enrico
Kaspar, Jan
Kothuri, Prasanth
Lamanna, Massimo
Malawski, Maciej
Mnich, Aleksandra
Moscicki, Jakub
Murali, Shravan
Piparo, Danilo
Tejedor, Enric
author_sort Avati, Valentina
collection CERN
description The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores.
id oai-inspirehep.net-1744687
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling oai-inspirehep.net-17446872020-01-21T14:11:07Zdoi:10.1109/UCC-Companion.2018.00018http://cds.cern.ch/record/2706003engAvati, ValentinaBlaszkiewicz, MiloszBocchi, EnricoCanali, LucaCastro, DiogoCervantes, JavierGrzanka, LeszekGuiraud, EnricoKaspar, JanKothuri, PrasanthLamanna, MassimoMalawski, MaciejMnich, AleksandraMoscicki, JakubMurali, ShravanPiparo, DaniloTejedor, EnricBig Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM ExperimentComputing and ComputersThe High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores.oai:inspirehep.net:17446872019
spellingShingle Computing and Computers
Avati, Valentina
Blaszkiewicz, Milosz
Bocchi, Enrico
Canali, Luca
Castro, Diogo
Cervantes, Javier
Grzanka, Leszek
Guiraud, Enrico
Kaspar, Jan
Kothuri, Prasanth
Lamanna, Massimo
Malawski, Maciej
Mnich, Aleksandra
Moscicki, Jakub
Murali, Shravan
Piparo, Danilo
Tejedor, Enric
Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
title Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
title_full Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
title_fullStr Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
title_full_unstemmed Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
title_short Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
title_sort big data tools and cloud services for high energy physics analysis in totem experiment
topic Computing and Computers
url https://dx.doi.org/10.1109/UCC-Companion.2018.00018
http://cds.cern.ch/record/2706003
work_keys_str_mv AT avativalentina bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT blaszkiewiczmilosz bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT bocchienrico bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT canaliluca bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT castrodiogo bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT cervantesjavier bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT grzankaleszek bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT guiraudenrico bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT kasparjan bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT kothuriprasanth bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT lamannamassimo bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT malawskimaciej bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT mnichaleksandra bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT moscickijakub bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT muralishravan bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT piparodanilo bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment
AT tejedorenric bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment