Cargando…

Analysis and improvement of the DIRAC CPU Performance Benchmarking tool

DB12, short for DIRAC Benchmark 12, is a fast benchmark tool designed to run within DIRAC and used for evaluating at run time the performance of the worker nodes. DB12 gives a faster, better estimation of the CPU power for LHCb applications. It’s also run from inside the DIRAC pilot before fetching...

Descripción completa

Detalles Bibliográficos
Autor principal: Iraoui, Imane
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2777969
Descripción
Sumario:DB12, short for DIRAC Benchmark 12, is a fast benchmark tool designed to run within DIRAC and used for evaluating at run time the performance of the worker nodes. DB12 gives a faster, better estimation of the CPU power for LHCb applications. It’s also run from inside the DIRAC pilot before fetching a job, and so in combination with the CPU time left, which we can get by interrogating the batch system, we can get an idea of which jobs we can run. DB12 however has some issues: • It does not support Python 3, as it is entirely written in Python 2. • Does not include CI/CD. • Copy pasted, instead of imported, within DIRAC. • Does not run well in multi-core environments: There is a function included in DB12 to run it on multiple cores in parallel, but it takes a lot of time to run and the accuracy of the scores is not certain. About 20% of the jobs fail because they run out of time on the Santos Dumont supercomputer for instance (See Figure 1). Currently when there are multi-cores DB12 is just run on one of the cores. This is not enough as DB12 should be run in parallel on every core and the lowest value should be used as reference, to make sure that jobs will not run out of time. To fix the issues, my work consisted of two parts: a coding part and an analysis part, which I am going to discuss in further detail in this report. In the first section, I will give some context about CERN and the LHCb experiment. Then, in section 2, I will describe the project in detail, to then discuss my work in section 3 and finally conclude the report.