Cargando…
DIRAC Site Director: Improving Pilot-Job provisioning on grid resources
To study the constituents of matter, CERN mainly relies on the Worldwide LHC Computing Grid (WLCG), which processes petabytes of data coming from the Large Hadron Collider (LHC). LHC experiments have adopted the Pilot-Job paradigm, and deliver tools to supply grid resources with Pilot-Jobs, to effic...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1016/j.future.2022.03.002 http://cds.cern.ch/record/2806286 |
Sumario: | To study the constituents of matter, CERN mainly relies on the Worldwide LHC Computing Grid
(WLCG), which processes petabytes of data coming from the Large Hadron Collider (LHC). LHC
experiments have adopted the Pilot-Job paradigm, and deliver tools to supply grid resources with
Pilot-Jobs, to efficiently leverage the computing power offered by WLCG. This sole approach will
be insufficient and will need to be complemented to meet future computing needs – of the HighLuminosity LHC – and the rise of data generated over time: national science programs are consolidating
computing resources and encourage using cloud and High-Performance Computing systems. Yet, even
though they have started to integrate their workflows on such infrastructures, LHC experiments still
largely depend on WLCG resources. This paper lays out an approach to increase the throughput of the
jobs, on grid resources, by improving the performance of the Pilot-Job provisioning tools through a case
study: the LHCb-specific solution, known as ‘‘DIRAC Site Director’’. We propose: (i) a complete analysis
of the capabilities and limitations of the DIRAC Site Director; (ii) several methods to speed up its
execution, including parallel processing as well as bulk operations; (iii) a comprehensive analysis of a
group of Site Directors in the LHCb production environment during 12 months. With our approach, we
recorded an increase of 40.86% of the number of jobs processed simultaneously per second, enabling
the simultaneous management of 80,300 LHCb jobs, while only 57,000 of them could be managed
before our improvements. |
---|