Cargando…

Optimisation of LHCb Applications for Multi- and Manycore Job Submission

The Worldwide LHC Computing Grid (WLCG) is the largest Computing Grid and is used by all Large Hadron Collider experiments in order to process their recorded data. It provides approximately 400k cores and storages. Nowadays, most of the resources consist of multi- and manycore processors. Conditions...

Descripción completa

Detalles Bibliográficos
Autor principal: Rauschmayr, Nathalie
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:http://cds.cern.ch/record/1985236
Descripción
Sumario:The Worldwide LHC Computing Grid (WLCG) is the largest Computing Grid and is used by all Large Hadron Collider experiments in order to process their recorded data. It provides approximately 400k cores and storages. Nowadays, most of the resources consist of multi- and manycore processors. Conditions at the Large Hadron Collider experiments will change and much larger workloads and jobs consuming more memory are expected in future. This has lead to a shift of paradigm which focuses on executing jobs as multiprocessor tasks in order to use multi- and manycore processors more efficiently. All experiments at CERN are currently investigating how such computing resources can be used more efficiently in terms of memory requirements and handling of concurrency. Until now, there are still many unsolved issues regarding software, scheduling, CPU accounting, task queues, which need to be solved by grid sites and experiments. This thesis develops a systematic approach to optimise the software of the LHCb experiment for multi- and manycore processors. This implies optimisation at the levels which are under control by LHCb's Workload Management System. First, this thesis analyses limitations of software and how to improve it by using intrusive and non intrusive techniques. In this scope, it discusses the applicability of parallelization concepts regarding High Energy Physics software. A parallel prototype is evaluated within extensive benchmarks. These include measuring memory reduction, runtime, hardware performance counters as well as tests on correctness of the output of data. Tools for automatic memory deduplication and compression are evaluated in the context of non intrusive optimisation. It also discusses how the change from 32- to 64-bit impacted LHCb software and how it can profit from the new platform model x32-ABI. Executing jobs as parallel tasks must be also supported by the grid sites. Until now, it is an unsolved issue whether scheduling of multiprocessor tasks is subject to the Virtual Organization (VO) or the grid site. Since the Virtual Organization has the insight into job parameters and past workloads, the thesis proposes a moldable job scheduler which optimises the job throughput of VO's task queues. Moldability implies that a job can be executed with an arbitrary number of processes and it is up to the scheduler to define the best value. Therefore, the thesis defines the scheduling problem which meets the requirements of LHCb jobs and evaluates different local search methods. The Worldwide LHC Computing Grid is a highly dynamic system which offers a large variety of different computing resources. Additionally, experiment conditions and software often change which have a significant impact on generated workloads. It is important, that a scheduler learns over time these changing conditions. Consequently, the thesis undertakes a detailed analysis of LHCb workloads and figures out how job requirements can be better predicted by a supervised learning algorithm.