Cargando…

Contributions to Computing needs in High Energy Physics Offline Activities: Towards an efficient exploitation of heterogeneous, distributed and shared Computing Resources

Pushing the boundaries of sciences and providing more advanced services to individuals and communities continuously demand more sophisticated software, specialized hardware, and a growing need for computing power and storage. At the beginning of the 2020s, we are entering a heterogeneous and distrib...

Descripción completa

Detalles Bibliográficos
Autor principal: Boyer, Alexandre Franck
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2853058
Descripción
Sumario:Pushing the boundaries of sciences and providing more advanced services to individuals and communities continuously demand more sophisticated software, specialized hardware, and a growing need for computing power and storage. At the beginning of the 2020s, we are entering a heterogeneous and distributed computing era where resources will be limited and constrained. Grid communities need to adapt their approach: (i) applications need to support various architectures; (ii) workload management systems have to manage various computing paradigms and guarantee the proper execution of the applications, regardless of the constraints of the underlying systems. This thesis focuses on the latter point through the case of the LHCb experiment. The LHCb collaboration currently relies on an infrastructure involving 170 computing centers across the world, the World LHC Computing Grid, to process a growing amount of Monte Carlo simulations, reproducing the experimental conditions of the experiment. Despite its huge size, it will be unable to handle simulations coming from the next LHC runs in a decent time. In the meantime, national science programs are consolidating computing resources and encourage using supercomputers, which provide tremendous computing power but pose higher integration challenges. In this thesis, we propose different approaches to supply distributed and shared computing resources with LHCb tasks. We developed methods to increase the number of computing resource allocations and their duration. It resulted in an improvement of the LHCb job throughput on a grid infrastructure (+40.86\%). We also designed a series of software solutions to address issues in highly-constrained environments that can be found in supercomputers, such as lack of external connectivity and software dependencies. We have applied those concepts to leverage computing power from four partitions of supercomputers ranked in the Top500.