Cargando…

Performance optimization for the LHCb experiment

The LHCb experiment, at CERN, is preparing a major upgrade of its detector and a change from an hardware-based to a fully software-based trigger system. It is now facing the challenge of being able to process incoming events at a rate of 30 million events per second. To cope with this massive data i...

Descripción completa

Detalles Bibliográficos
Autor principal: Hennequin, Arthur Marius
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2855920
_version_ 1780977484885393408
author Hennequin, Arthur Marius
author_facet Hennequin, Arthur Marius
author_sort Hennequin, Arthur Marius
collection CERN
description The LHCb experiment, at CERN, is preparing a major upgrade of its detector and a change from an hardware-based to a fully software-based trigger system. It is now facing the challenge of being able to process incoming events at a rate of 30 million events per second. To cope with this massive data input, the software must be optimized to use the processing power of the filtering farm more efficiently. This thesis focus on the first algorithm of LHCb’s High Level Trigger software: the Vertex Locator (VELO) reconstruction algorithm. The VELO is the first detector encountered by particles, directly surrounding the interaction region. Its goal is to find the initial track candidate that are then followed through the other layers of the LHCb detector with a good enough resolution that they could also be used to locate the origin of the collisions. The first step of this algorithm is to prepare the data by grouping pixels of the silicon sensors into hits; this process is called connected component analysis (CCA). This thesis presents multiple new CCA algorithms for both CPU and GPU architectures. The first algorithm, HA4, was developed at the very start of this thesis and improved the state-of-the-art in connected component labeling on GPUs, as well as being the first efficient implementation of connected component analysis on GPUs. The second algorithm is a GPU port of the FLSL SIMD CPU algorithm, inspired by the LSL algorithm. FLSL on GPUs improved upon HA4 by reducing the memory accesses conflicts that are especially presents on new hardware with a lot of cores. Along with FLSL, two other optimisations aimed at further reducing conflicts are presented and evaluated. On CPU, two new algorithms were made for this thesis. The first one is a modification of the classic Rosenfeld algorithm to use SIMD. The second one is a new algorithm, named SparseCCL, which takes advantage of the sparsity of the input images. A new VELO reconstruction algorithm using SIMD is presented, that enable LHCb to process events in real time and improve the quality of the reconstruction. The SIMDWrapper library, developed for the new VELO algorithm, is now part of LHCb’s software and is used in other algorithms.
id cern-2855920
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28559202023-04-17T19:23:12Zhttp://cds.cern.ch/record/2855920engHennequin, Arthur MariusPerformance optimization for the LHCb experimentDetectors and Experimental TechniquesThe LHCb experiment, at CERN, is preparing a major upgrade of its detector and a change from an hardware-based to a fully software-based trigger system. It is now facing the challenge of being able to process incoming events at a rate of 30 million events per second. To cope with this massive data input, the software must be optimized to use the processing power of the filtering farm more efficiently. This thesis focus on the first algorithm of LHCb’s High Level Trigger software: the Vertex Locator (VELO) reconstruction algorithm. The VELO is the first detector encountered by particles, directly surrounding the interaction region. Its goal is to find the initial track candidate that are then followed through the other layers of the LHCb detector with a good enough resolution that they could also be used to locate the origin of the collisions. The first step of this algorithm is to prepare the data by grouping pixels of the silicon sensors into hits; this process is called connected component analysis (CCA). This thesis presents multiple new CCA algorithms for both CPU and GPU architectures. The first algorithm, HA4, was developed at the very start of this thesis and improved the state-of-the-art in connected component labeling on GPUs, as well as being the first efficient implementation of connected component analysis on GPUs. The second algorithm is a GPU port of the FLSL SIMD CPU algorithm, inspired by the LSL algorithm. FLSL on GPUs improved upon HA4 by reducing the memory accesses conflicts that are especially presents on new hardware with a lot of cores. Along with FLSL, two other optimisations aimed at further reducing conflicts are presented and evaluated. On CPU, two new algorithms were made for this thesis. The first one is a modification of the classic Rosenfeld algorithm to use SIMD. The second one is a new algorithm, named SparseCCL, which takes advantage of the sparsity of the input images. A new VELO reconstruction algorithm using SIMD is presented, that enable LHCb to process events in real time and improve the quality of the reconstruction. The SIMDWrapper library, developed for the new VELO algorithm, is now part of LHCb’s software and is used in other algorithms.CERN-THESIS-2022-338oai:cds.cern.ch:28559202023-04-12T10:52:49Z
spellingShingle Detectors and Experimental Techniques
Hennequin, Arthur Marius
Performance optimization for the LHCb experiment
title Performance optimization for the LHCb experiment
title_full Performance optimization for the LHCb experiment
title_fullStr Performance optimization for the LHCb experiment
title_full_unstemmed Performance optimization for the LHCb experiment
title_short Performance optimization for the LHCb experiment
title_sort performance optimization for the lhcb experiment
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/2855920
work_keys_str_mv AT hennequinarthurmarius performanceoptimizationforthelhcbexperiment