Cargando…

ATLAS Job Transforms: A Data Driven Workflow Engine

The need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows wit...

Descripción completa

Detalles Bibliográficos
Autores principales: Stewart, G A, Breaden-Madden, W B, Maddocks, H J, Harenberg, T, Sandhoff, M, Sarrazin, B
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/513/3/032094
http://cds.cern.ch/record/2026326
_version_ 1780947348262748160
author Stewart, G A
Breaden-Madden, W B
Maddocks, H J
Harenberg, T
Sandhoff, M
Sarrazin, B
author_facet Stewart, G A
Breaden-Madden, W B
Maddocks, H J
Harenberg, T
Sandhoff, M
Sarrazin, B
author_sort Stewart, G A
collection CERN
description The need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows within a single grid job have increased. In ATLAS, a new Job Transform framework has been developed that we describe in this paper. This framework manages the multiple execution steps needed to 'transform' one data type into another (e.g., RAW data to ESD to AOD to final ntuple) and also provides a consistent interface for the ATLAS production system. The new framework uses a data driven workflow definition which is both easy to manage and powerful. After a transform is defined, jobs are expressed simply by specifying the input data and the desired output data. The transform infrastructure then executes only the necessary substeps to produce the final data products. The global execution cost of running the job is minimised and the transform can adapt to scenarios where data can be produced along different execution paths. Transforms for specific physics tasks which support up to 60 individual substeps have been successfully run. As the new transforms infrastructure has been deployed in production many features have been added to the framework which improve reliability, quality of error reporting and also provide support for multi-process jobs.
id oai-inspirehep.net-1302062
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
record_format invenio
spelling oai-inspirehep.net-13020622022-08-17T13:29:08Zdoi:10.1088/1742-6596/513/3/032094http://cds.cern.ch/record/2026326engStewart, G ABreaden-Madden, W BMaddocks, H JHarenberg, TSandhoff, MSarrazin, BATLAS Job Transforms: A Data Driven Workflow EngineComputing and ComputersThe need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows within a single grid job have increased. In ATLAS, a new Job Transform framework has been developed that we describe in this paper. This framework manages the multiple execution steps needed to 'transform' one data type into another (e.g., RAW data to ESD to AOD to final ntuple) and also provides a consistent interface for the ATLAS production system. The new framework uses a data driven workflow definition which is both easy to manage and powerful. After a transform is defined, jobs are expressed simply by specifying the input data and the desired output data. The transform infrastructure then executes only the necessary substeps to produce the final data products. The global execution cost of running the job is minimised and the transform can adapt to scenarios where data can be produced along different execution paths. Transforms for specific physics tasks which support up to 60 individual substeps have been successfully run. As the new transforms infrastructure has been deployed in production many features have been added to the framework which improve reliability, quality of error reporting and also provide support for multi-process jobs.oai:inspirehep.net:13020622014
spellingShingle Computing and Computers
Stewart, G A
Breaden-Madden, W B
Maddocks, H J
Harenberg, T
Sandhoff, M
Sarrazin, B
ATLAS Job Transforms: A Data Driven Workflow Engine
title ATLAS Job Transforms: A Data Driven Workflow Engine
title_full ATLAS Job Transforms: A Data Driven Workflow Engine
title_fullStr ATLAS Job Transforms: A Data Driven Workflow Engine
title_full_unstemmed ATLAS Job Transforms: A Data Driven Workflow Engine
title_short ATLAS Job Transforms: A Data Driven Workflow Engine
title_sort atlas job transforms: a data driven workflow engine
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/513/3/032094
http://cds.cern.ch/record/2026326
work_keys_str_mv AT stewartga atlasjobtransformsadatadrivenworkflowengine
AT breadenmaddenwb atlasjobtransformsadatadrivenworkflowengine
AT maddockshj atlasjobtransformsadatadrivenworkflowengine
AT harenbergt atlasjobtransformsadatadrivenworkflowengine
AT sandhoffm atlasjobtransformsadatadrivenworkflowengine
AT sarrazinb atlasjobtransformsadatadrivenworkflowengine