Cargando…

ATLAS Global Shares Implementation in the PanDA Workload Management System

PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by...

Descripción completa

Detalles Bibliográficos
Autor principal:	Barreiro Megino, Fernando Harald
Lenguaje:	eng
Publicado:	2018
Materias:	Particle Physics - Experiment
Acceso en línea:	http://cds.cern.ch/record/2626911

_version_	1780958933133819904
author	Barreiro Megino, Fernando Harald
author_facet	Barreiro Megino, Fernando Harald
author_sort	Barreiro Megino, Fernando Harald
collection	CERN
description	PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics Coordination group needs to allocate the amount of Grid resources dedicated to each activity, in order to manage sharing of CPU resources among various parallel campaigns and to make sure that results can be achieved in time for important deadlines. While dynamic and static shares on batch systems have been around for a long time, we are trying to move away from local resource partitioning and manage shares at a global level in the PanDA system. The global solution is not straightforward, given different requirements of the activities (number of cores, memory, I/O and CPU intensity), the heterogeneity of Grid resources (site/HW capabilities, batch configuration and queue setup) and constraints on data locality. We have therefore started the Global Shares project that follows a requirements-driven multi-step execution plan, starting from definition of nestable shares, implementing share-aware job dispatch, aligning internal processes with global shares and finally implementing a pilot stream control for controlling the batch slots while keeping late binding. This contribution will explain the development work and architectural changes in PanDA to implement Global Shares, and provide an operational point of view with the difficulties we found along the way.
id	cern-2626911
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2018
record_format	invenio
spelling	cern-26269112019-11-14T08:33:45Zhttp://cds.cern.ch/record/2626911engBarreiro Megino, Fernando HaraldATLAS Global Shares Implementation in the PanDA Workload Management SystemParticle Physics - ExperimentPanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics Coordination group needs to allocate the amount of Grid resources dedicated to each activity, in order to manage sharing of CPU resources among various parallel campaigns and to make sure that results can be achieved in time for important deadlines. While dynamic and static shares on batch systems have been around for a long time, we are trying to move away from local resource partitioning and manage shares at a global level in the PanDA system. The global solution is not straightforward, given different requirements of the activities (number of cores, memory, I/O and CPU intensity), the heterogeneity of Grid resources (site/HW capabilities, batch configuration and queue setup) and constraints on data locality. We have therefore started the Global Shares project that follows a requirements-driven multi-step execution plan, starting from definition of nestable shares, implementing share-aware job dispatch, aligning internal processes with global shares and finally implementing a pilot stream control for controlling the batch slots while keeping late binding. This contribution will explain the development work and architectural changes in PanDA to implement Global Shares, and provide an operational point of view with the difficulties we found along the way.ATL-SOFT-SLIDE-2018-413oai:cds.cern.ch:26269112018-06-27
spellingShingle	Particle Physics - Experiment Barreiro Megino, Fernando Harald ATLAS Global Shares Implementation in the PanDA Workload Management System
title	ATLAS Global Shares Implementation in the PanDA Workload Management System
title_full	ATLAS Global Shares Implementation in the PanDA Workload Management System
title_fullStr	ATLAS Global Shares Implementation in the PanDA Workload Management System
title_full_unstemmed	ATLAS Global Shares Implementation in the PanDA Workload Management System
title_short	ATLAS Global Shares Implementation in the PanDA Workload Management System
title_sort	atlas global shares implementation in the panda workload management system
topic	Particle Physics - Experiment
url	http://cds.cern.ch/record/2626911
work_keys_str_mv	AT barreiromeginofernandoharald atlasglobalsharesimplementationinthepandaworkloadmanagementsystem

ATLAS Global Shares Implementation in the PanDA Workload Management System

Ejemplares similares