Cargando…
Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138321/ https://www.ncbi.nlm.nih.gov/pubmed/34027400 http://dx.doi.org/10.3389/fdata.2021.661501 |
_version_ | 1783695781529649152 |
---|---|
author | Šimko, Tibor Heinrich, Lukas Alexander Lange, Clemens Lintuluoto, Adelina Eleonora MacDonell, Danika Marina Mečionis, Audrius Rodríguez Rodríguez, Diego Shandilya, Parth Vidal García, Marco |
author_facet | Šimko, Tibor Heinrich, Lukas Alexander Lange, Clemens Lintuluoto, Adelina Eleonora MacDonell, Danika Marina Mečionis, Audrius Rodríguez Rodríguez, Diego Shandilya, Parth Vidal García, Marco |
author_sort | Šimko, Tibor |
collection | PubMed |
description | We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses. |
format | Online Article Text |
id | pubmed-8138321 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81383212021-05-22 Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds Šimko, Tibor Heinrich, Lukas Alexander Lange, Clemens Lintuluoto, Adelina Eleonora MacDonell, Danika Marina Mečionis, Audrius Rodríguez Rodríguez, Diego Shandilya, Parth Vidal García, Marco Front Big Data Big Data We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses. Frontiers Media S.A. 2021-05-07 /pmc/articles/PMC8138321/ /pubmed/34027400 http://dx.doi.org/10.3389/fdata.2021.661501 Text en Copyright © 2021 Šimko, Heinrich, Lange, Lintuluoto, MacDonell, Mečionis, Rodríguez Rodríguez, Shandilya and Vidal García. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Šimko, Tibor Heinrich, Lukas Alexander Lange, Clemens Lintuluoto, Adelina Eleonora MacDonell, Danika Marina Mečionis, Audrius Rodríguez Rodríguez, Diego Shandilya, Parth Vidal García, Marco Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds |
title | Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds |
title_full | Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds |
title_fullStr | Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds |
title_full_unstemmed | Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds |
title_short | Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds |
title_sort | scalable declarative hep analysis workflows for containerised compute clouds |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138321/ https://www.ncbi.nlm.nih.gov/pubmed/34027400 http://dx.doi.org/10.3389/fdata.2021.661501 |
work_keys_str_mv | AT simkotibor scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT heinrichlukasalexander scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT langeclemens scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT lintuluotoadelinaeleonora scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT macdonelldanikamarina scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT mecionisaudrius scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT rodriguezrodriguezdiego scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT shandilyaparth scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds AT vidalgarciamarco scalabledeclarativehepanalysisworkflowsforcontainerisedcomputeclouds |