Cargando…

The ATLAS PanDA Pilot in Operation

The Production and Distributed Analysis system (PanDA) [1-2] was designed to meet ATLAS [3] requirements for a data-driven workload management system capable of operating at LHC data processing scale. Submitted jobs are executed on worker nodes by pilot jobs sent to the grid sites by pilot factories...

Descripción completa

Detalles Bibliográficos
Autores principales: Nilsson, P, Caballero, J, De, K, Maeno, T, Stradling, A, Wenaus, T
Lenguaje:eng
Publicado: 2011
Materias:
Acceso en línea:http://cds.cern.ch/record/1322425
Descripción
Sumario:The Production and Distributed Analysis system (PanDA) [1-2] was designed to meet ATLAS [3] requirements for a data-driven workload management system capable of operating at LHC data processing scale. Submitted jobs are executed on worker nodes by pilot jobs sent to the grid sites by pilot factories. This paper provides an overview of the PanDA pilot [4] system and presents major features added in light of recent operational experience, including multi-job processing, advanced job recovery for jobs with output storage failures, gLExec [5-6] based identity switching from the generic pilot to the actual user, and other security measures. The PanDA system serves all ATLAS distributed processing and is the primary system for distributed analysis; it is currently used at over 100 sites world-wide. We analyze the performance of the pilot system in processing real LHC data on the OSG [7], EGI [8] and Nordugrid [9-10] infrastructures used by ATLAS, and describe plans for its evolution.