Cargando…

ATLAS Global Shares implementation in PanDA

PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by...

Descripción completa

Detalles Bibliográficos
Autores principales: Barreiro Megino, Fernando, Di Girolamo, Alessandro, De, Kaushik, Maeno, Tadashi, Walker, Rodney
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:https://dx.doi.org/10.1051/epjconf/201921403025
http://cds.cern.ch/record/2648479
_version_ 1780960683777589248
author Barreiro Megino, Fernando
Di Girolamo, Alessandro
De, Kaushik
Maeno, Tadashi
Walker, Rodney
author_facet Barreiro Megino, Fernando
Di Girolamo, Alessandro
De, Kaushik
Maeno, Tadashi
Walker, Rodney
author_sort Barreiro Megino, Fernando
collection CERN
description PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics Coordination group needs to allocate the amount of Grid resources dedicated to each activity, in order to manage sharing of CPU resources among various parallel campaigns and to make sure that results can be achieved in time for important deadlines. While dynamic and static shares on batch systems have been around for a long time, we are trying to move away from local resource partitioning and manage shares at a global level in the PanDA system. The global solution is not straightforward, given different requirements of the activities (number of cores, memory, I/O and CPU intensity), the heterogeneity of Grid resources (site/HW capabilities, batch configuration and queue setup) and constraints on data locality. We have therefore started the Global Shares project that follows a requirements-driven multi-step execution plan, starting from definition of nestable shares, implementing share-aware job dispatch, aligning internal processes with global shares and finally implementing a pilot stream control for controlling the batch slots that keeps late binding. This contribution will explain the development work and architectural changes in PanDA to implement Global Shares, and describe how the Global Shares project has enabled the central control of resources and significantly reduced manual operations.
id cern-2648479
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling cern-26484792022-08-10T12:21:44Zdoi:10.1051/epjconf/201921403025http://cds.cern.ch/record/2648479engBarreiro Megino, FernandoDi Girolamo, AlessandroDe, KaushikMaeno, TadashiWalker, RodneyATLAS Global Shares implementation in PanDAParticle Physics - ExperimentPanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics Coordination group needs to allocate the amount of Grid resources dedicated to each activity, in order to manage sharing of CPU resources among various parallel campaigns and to make sure that results can be achieved in time for important deadlines. While dynamic and static shares on batch systems have been around for a long time, we are trying to move away from local resource partitioning and manage shares at a global level in the PanDA system. The global solution is not straightforward, given different requirements of the activities (number of cores, memory, I/O and CPU intensity), the heterogeneity of Grid resources (site/HW capabilities, batch configuration and queue setup) and constraints on data locality. We have therefore started the Global Shares project that follows a requirements-driven multi-step execution plan, starting from definition of nestable shares, implementing share-aware job dispatch, aligning internal processes with global shares and finally implementing a pilot stream control for controlling the batch slots that keeps late binding. This contribution will explain the development work and architectural changes in PanDA to implement Global Shares, and describe how the Global Shares project has enabled the central control of resources and significantly reduced manual operations.PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics Coordination group needs to allocate the amount of Grid resources dedicated to each activity, in order to manage sharing of CPU resources among various parallel campaigns and to make sure that results can be achieved in time for important deadlines. While dynamic and static shares on batch systems have been around for a long time, we are trying to move away from local resource partitioning and manage shares at a global level in the PanDA system. The global solution is not straightforward, given different requirements of the activities (number of cores, memory, I/O and CPU intensity), the heterogeneity of Grid resources (site/HW capabilities, batch configuration and queue setup) and constraints on data locality. We have therefore started the Global Shares project that follows a requirements-driven multi-step execution plan, starting from definition of nestable shares, implementing share-aware job dispatch, aligning internal processes with global shares and finally implementing a pilot stream control for controlling the batch slots that keeps late binding. This contribution will explain the development work and architectural changes in PanDA to implement Global Shares, and provide an operational point of view with the difficulties we found along the wayATL-SOFT-PROC-2018-022oai:cds.cern.ch:26484792018-11-21
spellingShingle Particle Physics - Experiment
Barreiro Megino, Fernando
Di Girolamo, Alessandro
De, Kaushik
Maeno, Tadashi
Walker, Rodney
ATLAS Global Shares implementation in PanDA
title ATLAS Global Shares implementation in PanDA
title_full ATLAS Global Shares implementation in PanDA
title_fullStr ATLAS Global Shares implementation in PanDA
title_full_unstemmed ATLAS Global Shares implementation in PanDA
title_short ATLAS Global Shares implementation in PanDA
title_sort atlas global shares implementation in panda
topic Particle Physics - Experiment
url https://dx.doi.org/10.1051/epjconf/201921403025
http://cds.cern.ch/record/2648479
work_keys_str_mv AT barreiromeginofernando atlasglobalsharesimplementationinpanda
AT digirolamoalessandro atlasglobalsharesimplementationinpanda
AT dekaushik atlasglobalsharesimplementationinpanda
AT maenotadashi atlasglobalsharesimplementationinpanda
AT walkerrodney atlasglobalsharesimplementationinpanda