Cargando…

Job prioritization in LHCb

LHCb is one of the four high-energy experiments running in the near future at the Large Hadron Collider (LHC) at CERN. LHCb will try to answer some fundamental questions about the asymmetry between matter and anti-matter. The experiment is expected to produce about 2PB of data per year. Those will b...

Descripción completa

Detalles Bibliográficos
Autor principal:	Castellani, G
Lenguaje:	eng
Publicado:	2007
Materias:	Detectors and Experimental Techniques Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1120923

_version_	1780914568975876096
author	Castellani, G
author_facet	Castellani, G
author_sort	Castellani, G
collection	CERN
description	LHCb is one of the four high-energy experiments running in the near future at the Large Hadron Collider (LHC) at CERN. LHCb will try to answer some fundamental questions about the asymmetry between matter and anti-matter. The experiment is expected to produce about 2PB of data per year. Those will be distributed to several laboratories all over Europe and then analyzed by the Physics community. To achieve this target LHCb fully uses the Grid to reprocess, replicate and analyze data. The access to the Grid happens through LHCb's own distributed production and analysis system, DIRAC (Distributed Infrastructure with Remote Agent Control). Dirac implements the ‘pull’ job scheduling paradigm, where all the jobs are stored in a central task queues and then pulled via generic grid jobs called Pilot Agents. The whole LHCb community (about 600 people) is divided in sets of physicists, developers, production and software managers that have different needs about their jobs on the Grid. While a Monte Carlo simulation job needs several days of intensive CPU time, the analysis jobs just need to start immediately. The current state of affairs, where all the users access the Grid through a single entry point, does not prevent certain sub-communities running most of the jobs and then monopolizing the use of Grid resources. The way to avoid this is to implement a system that ensures job priority and fair share of the resources among all the community users. There are two possible approaches: a site-wise approach where the VO just takes care of filling up its queues and leaves the site-specific software to redistribute the jobs accordingly to early negotiations; a VO-wise approach, best tailored to the LHCb computing model, where the site just allocates the quota of resources competing to the VO and the VO decides how to share it across its users sub-communities. A rough priority algorithm based on the VO-wise approach has already been implemented. The introduction of a ‘Priority'' flag in the specification of the job and some changes in the resource-job matching mechanism already proved to guarantee the right precedence to short analysis jobs or to Reconstruction jobs with respect of cumbersome Monte Carlo jobs. Our Priority algorithm must be considered as a work-in -progress development. Accounting information based on both the user, job length and community CPU consumption will also be considered. The job priority mechanism needs to be extensively tested. An ageing system will also be introduced to avoid that some jobs stay too long in the central queues before being picked-up at the first suitable resource available. The mechanism relies on the assumption that DIRAC is the only access to the Grid but does not prevent users to bypass it and access the Grid somehow else. A tool to enforce VO policy at site level is then highly desired.
id	cern-1120923
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2007
record_format	invenio
spelling	cern-11209232019-09-30T06:29:59Zhttp://cds.cern.ch/record/1120923engCastellani, GJob prioritization in LHCbDetectors and Experimental TechniquesComputing and ComputersLHCb is one of the four high-energy experiments running in the near future at the Large Hadron Collider (LHC) at CERN. LHCb will try to answer some fundamental questions about the asymmetry between matter and anti-matter. The experiment is expected to produce about 2PB of data per year. Those will be distributed to several laboratories all over Europe and then analyzed by the Physics community. To achieve this target LHCb fully uses the Grid to reprocess, replicate and analyze data. The access to the Grid happens through LHCb's own distributed production and analysis system, DIRAC (Distributed Infrastructure with Remote Agent Control). Dirac implements the ‘pull’ job scheduling paradigm, where all the jobs are stored in a central task queues and then pulled via generic grid jobs called Pilot Agents. The whole LHCb community (about 600 people) is divided in sets of physicists, developers, production and software managers that have different needs about their jobs on the Grid. While a Monte Carlo simulation job needs several days of intensive CPU time, the analysis jobs just need to start immediately. The current state of affairs, where all the users access the Grid through a single entry point, does not prevent certain sub-communities running most of the jobs and then monopolizing the use of Grid resources. The way to avoid this is to implement a system that ensures job priority and fair share of the resources among all the community users. There are two possible approaches: a site-wise approach where the VO just takes care of filling up its queues and leaves the site-specific software to redistribute the jobs accordingly to early negotiations; a VO-wise approach, best tailored to the LHCb computing model, where the site just allocates the quota of resources competing to the VO and the VO decides how to share it across its users sub-communities. A rough priority algorithm based on the VO-wise approach has already been implemented. The introduction of a ‘Priority'' flag in the specification of the job and some changes in the resource-job matching mechanism already proved to guarantee the right precedence to short analysis jobs or to Reconstruction jobs with respect of cumbersome Monte Carlo jobs. Our Priority algorithm must be considered as a work-in -progress development. Accounting information based on both the user, job length and community CPU consumption will also be considered. The job priority mechanism needs to be extensively tested. An ageing system will also be introduced to avoid that some jobs stay too long in the central queues before being picked-up at the first suitable resource available. The mechanism relies on the assumption that DIRAC is the only access to the Grid but does not prevent users to bypass it and access the Grid somehow else. A tool to enforce VO policy at site level is then highly desired.oai:cds.cern.ch:11209232007
spellingShingle	Detectors and Experimental Techniques Computing and Computers Castellani, G Job prioritization in LHCb
title	Job prioritization in LHCb
title_full	Job prioritization in LHCb
title_fullStr	Job prioritization in LHCb
title_full_unstemmed	Job prioritization in LHCb
title_short	Job prioritization in LHCb
title_sort	job prioritization in lhcb
topic	Detectors and Experimental Techniques Computing and Computers
url	http://cds.cern.ch/record/1120923
work_keys_str_mv	AT castellanig jobprioritizationinlhcb

Job prioritization in LHCb

Ejemplares similares