Cargando…

Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds

We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each...

Descripción completa

Detalles Bibliográficos
Autores principales: Šimko, Tibor, Heinrich, Lukas Alexander, Lange, Clemens, Lintuluoto, Adelina Eleonora, MacDonell, Danika Marina, Mečionis, Audrius, Rodríguez Rodríguez, Diego, Shandilya, Parth, Vidal García, Marco
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:https://dx.doi.org/10.3389/fdata.2021.661501
http://cds.cern.ch/record/2773276
_version_ 1780971519825936384
author Šimko, Tibor
Heinrich, Lukas Alexander
Lange, Clemens
Lintuluoto, Adelina Eleonora
MacDonell, Danika Marina
Mečionis, Audrius
Rodríguez Rodríguez, Diego
Shandilya, Parth
Vidal García, Marco
author_facet Šimko, Tibor
Heinrich, Lukas Alexander
Lange, Clemens
Lintuluoto, Adelina Eleonora
MacDonell, Danika Marina
Mečionis, Audrius
Rodríguez Rodríguez, Diego
Shandilya, Parth
Vidal García, Marco
author_sort Šimko, Tibor
collection CERN
description We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.
id oai-inspirehep.net-1868244
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling oai-inspirehep.net-18682442021-07-02T14:05:13Zdoi:10.3389/fdata.2021.661501http://cds.cern.ch/record/2773276engŠimko, TiborHeinrich, Lukas AlexanderLange, ClemensLintuluoto, Adelina EleonoraMacDonell, Danika MarinaMečionis, AudriusRodríguez Rodríguez, DiegoShandilya, ParthVidal García, MarcoScalable Declarative HEP Analysis Workflows for Containerised Compute CloudsComputing and ComputersWe describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.oai:inspirehep.net:18682442021
spellingShingle Computing and Computers
Šimko, Tibor
Heinrich, Lukas Alexander
Lange, Clemens
Lintuluoto, Adelina Eleonora
MacDonell, Danika Marina
Mečionis, Audrius
Rodríguez Rodríguez, Diego
Shandilya, Parth
Vidal García, Marco
Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_full Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_fullStr Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_full_unstemmed Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_short Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
title_sort scalable declarative hep analysis workflows for containerised compute clouds
topic Computing and Computers
url https://dx.doi.org/10.3389/fdata.2021.661501
http://cds.cern.ch/record/2773276
work_keys_str_mv AT simkotibor scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT heinrichlukasalexander scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT langeclemens scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT lintuluotoadelinaeleonora scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT macdonelldanikamarina scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT mecionisaudrius scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT rodriguezrodriguezdiego scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT shandilyaparth scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds
AT vidalgarciamarco scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds