Cargando…

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

The ATLAS experiment at the CERN LHC is one of the largest users of grid computing infrastructure, which is a central part of the experiment’s computing operations. Considerable efforts have been made to use grid technology in the most efficient and effective way, including the use of a pilot job ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Caballero, J, Hover, J, Love, P, Stewart, G
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1450167
_version_ 1780924908794019840
author Caballero, J
Hover, J
Love, P
Stewart, G
author_facet Caballero, J
Hover, J
Love, P
Stewart, G
author_sort Caballero, J
collection CERN
description The ATLAS experiment at the CERN LHC is one of the largest users of grid computing infrastructure, which is a central part of the experiment’s computing operations. Considerable efforts have been made to use grid technology in the most efficient and effective way, including the use of a pilot job based workload management framework. In this model the experiment submits ’pilot’ jobs to sites without payload. When these jobs begin to run they contact a central service to retrieve a real payload to execute. The first generation of pilot factories were usually specific to a single VO, and were bound to the particular architecture of that VO’s distributed processing. A second generation provides factories which are more flexible, not tied to any particular VO, and provide new or improved features such as monitoring, logging, profiling, etc. In this paper we describe this key part of the ATLAS pilot architecture, a second generation pilot factory, AutoPyFactory. AutoPyFactory has a modular design and is highly configurable. It is able to send different types of pilots to sites and exploit a range of submission mechanisms and queue characteristics. It is tightly integrated with the PanDA job submission framework, coupling pilot flow to the amount of work the site has to run. It gathers information from many sources in order to correctly configure itself for a site, and its decision logic can easily be modified. Integrated into AutoPyFactory is a flexible system for delivering both generic and specific job wrappers which can perform many useful actions before starting to run end-user scientific applications, e.g. validation of the middleware, node profiling and diagnostics, and monitoring. AutoPyFactory now also has a robust monitoring system and we show how this has helped establish a reliable pilot factory service for ATLAS.
id cern-1450167
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14501672019-09-30T06:29:59Zhttp://cds.cern.ch/record/1450167engCaballero, JHover, JLove, PStewart, GAutoPyFactory: A Scalable Flexible Pilot Factory ImplementationDetectors and Experimental TechniquesThe ATLAS experiment at the CERN LHC is one of the largest users of grid computing infrastructure, which is a central part of the experiment’s computing operations. Considerable efforts have been made to use grid technology in the most efficient and effective way, including the use of a pilot job based workload management framework. In this model the experiment submits ’pilot’ jobs to sites without payload. When these jobs begin to run they contact a central service to retrieve a real payload to execute. The first generation of pilot factories were usually specific to a single VO, and were bound to the particular architecture of that VO’s distributed processing. A second generation provides factories which are more flexible, not tied to any particular VO, and provide new or improved features such as monitoring, logging, profiling, etc. In this paper we describe this key part of the ATLAS pilot architecture, a second generation pilot factory, AutoPyFactory. AutoPyFactory has a modular design and is highly configurable. It is able to send different types of pilots to sites and exploit a range of submission mechanisms and queue characteristics. It is tightly integrated with the PanDA job submission framework, coupling pilot flow to the amount of work the site has to run. It gathers information from many sources in order to correctly configure itself for a site, and its decision logic can easily be modified. Integrated into AutoPyFactory is a flexible system for delivering both generic and specific job wrappers which can perform many useful actions before starting to run end-user scientific applications, e.g. validation of the middleware, node profiling and diagnostics, and monitoring. AutoPyFactory now also has a robust monitoring system and we show how this has helped establish a reliable pilot factory service for ATLAS.ATL-SOFT-PROC-2012-045oai:cds.cern.ch:14501672012-05-22
spellingShingle Detectors and Experimental Techniques
Caballero, J
Hover, J
Love, P
Stewart, G
AutoPyFactory: A Scalable Flexible Pilot Factory Implementation
title AutoPyFactory: A Scalable Flexible Pilot Factory Implementation
title_full AutoPyFactory: A Scalable Flexible Pilot Factory Implementation
title_fullStr AutoPyFactory: A Scalable Flexible Pilot Factory Implementation
title_full_unstemmed AutoPyFactory: A Scalable Flexible Pilot Factory Implementation
title_short AutoPyFactory: A Scalable Flexible Pilot Factory Implementation
title_sort autopyfactory: a scalable flexible pilot factory implementation
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1450167
work_keys_str_mv AT caballeroj autopyfactoryascalableflexiblepilotfactoryimplementation
AT hoverj autopyfactoryascalableflexiblepilotfactoryimplementation
AT lovep autopyfactoryascalableflexiblepilotfactoryimplementation
AT stewartg autopyfactoryascalableflexiblepilotfactoryimplementation