Cargando…

Continual Learning in the CMS Phase-2 Level-1 Trigger

Machine Learning (ML) algorithms are a key enhancement of the CMS Level-1 (L1) Trigger Phase-2 upgrade. However, they may lack robustness against domain shift due to changing experimental conditions. This is of particular importance in the L1 Trigger where untriggered events cannot be recovered. The...

Descripción completa

Detalles Bibliográficos
Autor principal: CMS Collaboration
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2859651
Descripción
Sumario:Machine Learning (ML) algorithms are a key enhancement of the CMS Level-1 (L1) Trigger Phase-2 upgrade. However, they may lack robustness against domain shift due to changing experimental conditions. This is of particular importance in the L1 Trigger where untriggered events cannot be recovered. These models have to be simple and compressed due to the low latency and low resource environment of the L1 trigger meaning they lack the robustness of a larger complex model that would be used offline. They are also typically focussed on a subsection of detector data so are particularly vulnerable to changes in single detector systems whereas High Level Trigger and offline analysis correlates information from all subsystems. There are different time scales at which experimental changes are expected and subsequent model retraining could happen. At the fastest level of a few seconds small fluctuations of the beam conditions can change and an embedded approach to model updating could be performed. At the scale of multiple hours or a fill of the LHC there could be beam condition or sudden detector changes such as cooling or high voltage issues: at this level a small dataset could be collected, for example from 40 MHz scouting [CMS-CR-2023-024], and used to update models for redeployment. This workflow for updating trigger algorithms is already seen for recalibration of the ECAL where the previous fill is used to evaluate current calibration and update if necessary [CMS-DP-2022-042] [CMS-DP-2022-068]. Finally, at a longer time scale of multiple months where significant detector changes could occur, larger Monte Carlo (MC) campaigns emulating the detector changes would be prepared and models would be entirely retrained and redeployed. We investigate the use of a Continual Learning (CL) approach to the problem of a changing environment where a ML model is constantly updated using a stream of labelled data. In the hardware L1 Trigger case study this could be from the unbiased 40 MHz scouting data that is used as truth level training data for a model. This would be a solution to the fill-level changes in detector conditions where generating a large MC dataset is not feasible. The CL solution has the advantages of not needing a large MC dataset, so can react quickly to different conditions compared to the timescale of a large MC production campaign. It also can perform small updates to a stable model footprint as compared to training a fresh model which could have variations in the quantisation or pruning, which is especially important in the low-resource, low-latency environment of the L1 Trigger. The case study considers the identification of failures in the L1 Trigger reconstruction of the primary interaction vertex in the high pileup HL-LHC environment. This application is chosen due to the importance of primary vertex finding for rejecting particles originating from pileup interactions in downstream algorithms, such as Particle Flow [CMS-CR-2018-401]. Identifying incorrectly reconstructed vertices can be used to improve efficiency for those downstream algorithms.