Cargando…
Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this pap...
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1109/UCC-Companion.2018.00018 http://cds.cern.ch/record/2706003 |
_version_ | 1780964886183936000 |
---|---|
author | Avati, Valentina Blaszkiewicz, Milosz Bocchi, Enrico Canali, Luca Castro, Diogo Cervantes, Javier Grzanka, Leszek Guiraud, Enrico Kaspar, Jan Kothuri, Prasanth Lamanna, Massimo Malawski, Maciej Mnich, Aleksandra Moscicki, Jakub Murali, Shravan Piparo, Danilo Tejedor, Enric |
author_facet | Avati, Valentina Blaszkiewicz, Milosz Bocchi, Enrico Canali, Luca Castro, Diogo Cervantes, Javier Grzanka, Leszek Guiraud, Enrico Kaspar, Jan Kothuri, Prasanth Lamanna, Massimo Malawski, Maciej Mnich, Aleksandra Moscicki, Jakub Murali, Shravan Piparo, Danilo Tejedor, Enric |
author_sort | Avati, Valentina |
collection | CERN |
description | The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores. |
id | oai-inspirehep.net-1744687 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2019 |
record_format | invenio |
spelling | oai-inspirehep.net-17446872020-01-21T14:11:07Zdoi:10.1109/UCC-Companion.2018.00018http://cds.cern.ch/record/2706003engAvati, ValentinaBlaszkiewicz, MiloszBocchi, EnricoCanali, LucaCastro, DiogoCervantes, JavierGrzanka, LeszekGuiraud, EnricoKaspar, JanKothuri, PrasanthLamanna, MassimoMalawski, MaciejMnich, AleksandraMoscicki, JakubMurali, ShravanPiparo, DaniloTejedor, EnricBig Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM ExperimentComputing and ComputersThe High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores.oai:inspirehep.net:17446872019 |
spellingShingle | Computing and Computers Avati, Valentina Blaszkiewicz, Milosz Bocchi, Enrico Canali, Luca Castro, Diogo Cervantes, Javier Grzanka, Leszek Guiraud, Enrico Kaspar, Jan Kothuri, Prasanth Lamanna, Massimo Malawski, Maciej Mnich, Aleksandra Moscicki, Jakub Murali, Shravan Piparo, Danilo Tejedor, Enric Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment |
title | Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment |
title_full | Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment |
title_fullStr | Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment |
title_full_unstemmed | Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment |
title_short | Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment |
title_sort | big data tools and cloud services for high energy physics analysis in totem experiment |
topic | Computing and Computers |
url | https://dx.doi.org/10.1109/UCC-Companion.2018.00018 http://cds.cern.ch/record/2706003 |
work_keys_str_mv | AT avativalentina bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT blaszkiewiczmilosz bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT bocchienrico bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT canaliluca bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT castrodiogo bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT cervantesjavier bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT grzankaleszek bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT guiraudenrico bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT kasparjan bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT kothuriprasanth bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT lamannamassimo bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT malawskimaciej bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT mnichaleksandra bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT moscickijakub bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT muralishravan bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT piparodanilo bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment AT tejedorenric bigdatatoolsandcloudservicesforhighenergyphysicsanalysisintotemexperiment |