Cargando…

The WorkQueue project - a task queue for the CMS workload management system

We present the development and first experience of a new component (termed WorkQueue) in the CMS workload management system. This component provides a link between a global request system (Request Manager) and agents (WMAgents) which process requests at compute and storage resources (known as sites)...

Descripción completa

Detalles Bibliográficos
Autores principales: Ryu, S, Wakefield, Stuart Lee
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/396/3/032114
http://cds.cern.ch/record/1458545
_version_ 1780925163239374848
author Ryu, S
Wakefield, Stuart Lee
author_facet Ryu, S
Wakefield, Stuart Lee
author_sort Ryu, S
collection CERN
description We present the development and first experience of a new component (termed WorkQueue) in the CMS workload management system. This component provides a link between a global request system (Request Manager) and agents (WMAgents) which process requests at compute and storage resources (known as sites). These requests typically consist of creation or processing of a data sample (possibly terabytes in size). Unlike the standard concept of a task queue, the WorkQueue does not contain fully resolved work units (known typically as jobs in HEP). This would require the WorkQueue to run computationally heavy algorithms that are better suited to run in the WMAgents. Instead the request specifies an algorithm that the WorkQueue uses to split the request into reasonable size chunks (known as elements). An advantage of performing lazy evaluation of an element is that expanding datasets can be accommodated by having job details resolved as late as possible. The WorkQueue architecture consists of a global WorkQueue which obtains requests from the request system, expands them and forms an element ordering based on the request priority. Each WMAgent contains a local WorkQueue which buffers work close to the agent, this overcomes temporary unavailability of the global WorkQueue and reduces latency for an agent to begin processing. Elements are pulled from the global WorkQueue to the local WorkQueue and into the WMAgent based on the estimate of the amount of work within the element and the resources available to the agent. WorkQueue is based on CouchDB, a document oriented no-sql database. WorkQueue uses the features of CouchDB (map/reduce views, bi-directional replication between distributed instances) to provide a scalable distributed system for managing large queues of work. The project described here represents an improvement over the old approach to workload management in CMS which involved individual operators feeding requests into agents. This new approach allows for a system where individual WMAgents are transient and can be added or removed from the system as needed.
id cern-1458545
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14585452019-09-30T06:29:59Zdoi:10.1088/1742-6596/396/3/032114http://cds.cern.ch/record/1458545engRyu, SWakefield, Stuart LeeThe WorkQueue project - a task queue for the CMS workload management systemDetectors and Experimental TechniquesWe present the development and first experience of a new component (termed WorkQueue) in the CMS workload management system. This component provides a link between a global request system (Request Manager) and agents (WMAgents) which process requests at compute and storage resources (known as sites). These requests typically consist of creation or processing of a data sample (possibly terabytes in size). Unlike the standard concept of a task queue, the WorkQueue does not contain fully resolved work units (known typically as jobs in HEP). This would require the WorkQueue to run computationally heavy algorithms that are better suited to run in the WMAgents. Instead the request specifies an algorithm that the WorkQueue uses to split the request into reasonable size chunks (known as elements). An advantage of performing lazy evaluation of an element is that expanding datasets can be accommodated by having job details resolved as late as possible. The WorkQueue architecture consists of a global WorkQueue which obtains requests from the request system, expands them and forms an element ordering based on the request priority. Each WMAgent contains a local WorkQueue which buffers work close to the agent, this overcomes temporary unavailability of the global WorkQueue and reduces latency for an agent to begin processing. Elements are pulled from the global WorkQueue to the local WorkQueue and into the WMAgent based on the estimate of the amount of work within the element and the resources available to the agent. WorkQueue is based on CouchDB, a document oriented no-sql database. WorkQueue uses the features of CouchDB (map/reduce views, bi-directional replication between distributed instances) to provide a scalable distributed system for managing large queues of work. The project described here represents an improvement over the old approach to workload management in CMS which involved individual operators feeding requests into agents. This new approach allows for a system where individual WMAgents are transient and can be added or removed from the system as needed.CMS-CR-2012-106oai:cds.cern.ch:14585452012-05-16
spellingShingle Detectors and Experimental Techniques
Ryu, S
Wakefield, Stuart Lee
The WorkQueue project - a task queue for the CMS workload management system
title The WorkQueue project - a task queue for the CMS workload management system
title_full The WorkQueue project - a task queue for the CMS workload management system
title_fullStr The WorkQueue project - a task queue for the CMS workload management system
title_full_unstemmed The WorkQueue project - a task queue for the CMS workload management system
title_short The WorkQueue project - a task queue for the CMS workload management system
title_sort workqueue project - a task queue for the cms workload management system
topic Detectors and Experimental Techniques
url https://dx.doi.org/10.1088/1742-6596/396/3/032114
http://cds.cern.ch/record/1458545
work_keys_str_mv AT ryus theworkqueueprojectataskqueueforthecmsworkloadmanagementsystem
AT wakefieldstuartlee theworkqueueprojectataskqueueforthecmsworkloadmanagementsystem
AT ryus workqueueprojectataskqueueforthecmsworkloadmanagementsystem
AT wakefieldstuartlee workqueueprojectataskqueueforthecmsworkloadmanagementsystem