Cargando…

A Serverless Engine for High Energy Physics Distributed Analysis

The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled tradit...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuśnierz, Jacek, Padulano, Vincenzo Eduardo, Malawski, Maciej, Burkiewicz, Kamil, Saavedra, Enric Tejedor, Alonso-Jordá, Pedro, Pitt, Michael, Avati, Valentina
Lenguaje:eng
Publicado: 2022
Materias:
Acceso en línea:https://dx.doi.org/10.1109/CCGrid54584.2022.00067
http://cds.cern.ch/record/2815205
_version_ 1780973496038326272
author Kuśnierz, Jacek
Padulano, Vincenzo Eduardo
Malawski, Maciej
Burkiewicz, Kamil
Saavedra, Enric Tejedor
Alonso-Jordá, Pedro
Pitt, Michael
Avati, Valentina
author_facet Kuśnierz, Jacek
Padulano, Vincenzo Eduardo
Malawski, Maciej
Burkiewicz, Kamil
Saavedra, Enric Tejedor
Alonso-Jordá, Pedro
Pitt, Michael
Avati, Valentina
author_sort Kuśnierz, Jacek
collection CERN
description The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.
id cern-2815205
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2022
record_format invenio
spelling cern-28152052023-01-31T10:48:55Zdoi:10.1109/CCGrid54584.2022.00067http://cds.cern.ch/record/2815205engKuśnierz, JacekPadulano, Vincenzo EduardoMalawski, MaciejBurkiewicz, KamilSaavedra, Enric TejedorAlonso-Jordá, PedroPitt, MichaelAvati, ValentinaA Serverless Engine for High Energy Physics Distributed Analysiscs.DCComputing and ComputersThe Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.arXiv:2206.00942oai:cds.cern.ch:28152052022-06-02
spellingShingle cs.DC
Computing and Computers
Kuśnierz, Jacek
Padulano, Vincenzo Eduardo
Malawski, Maciej
Burkiewicz, Kamil
Saavedra, Enric Tejedor
Alonso-Jordá, Pedro
Pitt, Michael
Avati, Valentina
A Serverless Engine for High Energy Physics Distributed Analysis
title A Serverless Engine for High Energy Physics Distributed Analysis
title_full A Serverless Engine for High Energy Physics Distributed Analysis
title_fullStr A Serverless Engine for High Energy Physics Distributed Analysis
title_full_unstemmed A Serverless Engine for High Energy Physics Distributed Analysis
title_short A Serverless Engine for High Energy Physics Distributed Analysis
title_sort serverless engine for high energy physics distributed analysis
topic cs.DC
Computing and Computers
url https://dx.doi.org/10.1109/CCGrid54584.2022.00067
http://cds.cern.ch/record/2815205
work_keys_str_mv AT kusnierzjacek aserverlessengineforhighenergyphysicsdistributedanalysis
AT padulanovincenzoeduardo aserverlessengineforhighenergyphysicsdistributedanalysis
AT malawskimaciej aserverlessengineforhighenergyphysicsdistributedanalysis
AT burkiewiczkamil aserverlessengineforhighenergyphysicsdistributedanalysis
AT saavedraenrictejedor aserverlessengineforhighenergyphysicsdistributedanalysis
AT alonsojordapedro aserverlessengineforhighenergyphysicsdistributedanalysis
AT pittmichael aserverlessengineforhighenergyphysicsdistributedanalysis
AT avativalentina aserverlessengineforhighenergyphysicsdistributedanalysis
AT kusnierzjacek serverlessengineforhighenergyphysicsdistributedanalysis
AT padulanovincenzoeduardo serverlessengineforhighenergyphysicsdistributedanalysis
AT malawskimaciej serverlessengineforhighenergyphysicsdistributedanalysis
AT burkiewiczkamil serverlessengineforhighenergyphysicsdistributedanalysis
AT saavedraenrictejedor serverlessengineforhighenergyphysicsdistributedanalysis
AT alonsojordapedro serverlessengineforhighenergyphysicsdistributedanalysis
AT pittmichael serverlessengineforhighenergyphysicsdistributedanalysis
AT avativalentina serverlessengineforhighenergyphysicsdistributedanalysis