Cargando…

Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case

The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the com...

Descripción completa

Detalles Bibliográficos
Autores principales: Tedeschi, Tommaso, Padulano, Vincenzo Eduardo, Spiga, Daniele, Ciangottini, Diego, Tracolli, Mirco, Tejedor Saavedra, Enric, Guiraud, Enrico, Biasotto, Massimo
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:https://dx.doi.org/10.1016/j.cpc.2023.108965
http://cds.cern.ch/record/2866753
_version_ 1780978119779287040
author Tedeschi, Tommaso
Padulano, Vincenzo Eduardo
Spiga, Daniele
Ciangottini, Diego
Tracolli, Mirco
Tejedor Saavedra, Enric
Guiraud, Enrico
Biasotto, Massimo
author_facet Tedeschi, Tommaso
Padulano, Vincenzo Eduardo
Spiga, Daniele
Ciangottini, Diego
Tracolli, Mirco
Tejedor Saavedra, Enric
Guiraud, Enrico
Biasotto, Massimo
author_sort Tedeschi, Tommaso
collection CERN
description The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
id cern-2866753
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28667532023-10-19T02:16:40Zdoi:10.1016/j.cpc.2023.108965http://cds.cern.ch/record/2866753engTedeschi, TommasoPadulano, Vincenzo EduardoSpiga, DanieleCiangottini, DiegoTracolli, MircoTejedor Saavedra, EnricGuiraud, EnricoBiasotto, MassimoPrototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use casehep-exParticle Physics - Experimentcs.CEComputing and Computerscs.DCComputing and ComputersThe challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.arXiv:2307.12579oai:cds.cern.ch:28667532023-07-24
spellingShingle hep-ex
Particle Physics - Experiment
cs.CE
Computing and Computers
cs.DC
Computing and Computers
Tedeschi, Tommaso
Padulano, Vincenzo Eduardo
Spiga, Daniele
Ciangottini, Diego
Tracolli, Mirco
Tejedor Saavedra, Enric
Guiraud, Enrico
Biasotto, Massimo
Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
title Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
title_full Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
title_fullStr Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
title_full_unstemmed Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
title_short Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
title_sort prototyping a root-based distributed analysis workflow for hl-lhc: the cms use case
topic hep-ex
Particle Physics - Experiment
cs.CE
Computing and Computers
cs.DC
Computing and Computers
url https://dx.doi.org/10.1016/j.cpc.2023.108965
http://cds.cern.ch/record/2866753
work_keys_str_mv AT tedeschitommaso prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT padulanovincenzoeduardo prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT spigadaniele prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT ciangottinidiego prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT tracollimirco prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT tejedorsaavedraenric prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT guiraudenrico prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase
AT biasottomassimo prototypingarootbaseddistributedanalysisworkflowforhllhcthecmsusecase