Cargando…

The CMS monitoring applications for LHC Run 3

Data taking at the Large Hadron Collider (LHC) at CERN restarted in 2022. The CMS experiment relies on a distributed computing infrastructure based on WLCG (Worldwide LHC Computing Grid) to support the LHC Run 3 physics program. The CMS computing infrastructure is highly heterogeneous and relies on...

Descripción completa

Detalles Bibliográficos
Autor principal: Legger, Federica
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2869308
_version_ 1780978272066076672
author Legger, Federica
author_facet Legger, Federica
author_sort Legger, Federica
collection CERN
description Data taking at the Large Hadron Collider (LHC) at CERN restarted in 2022. The CMS experiment relies on a distributed computing infrastructure based on WLCG (Worldwide LHC Computing Grid) to support the LHC Run 3 physics program. The CMS computing infrastructure is highly heterogeneous and relies on a set of centrally provided services, such as distributed workload management and data management, and computing resources hosted at almost 150 sites worldwide. Smooth data taking and processing requires all computing subsystems to be fully operational, and available computing and storage resources need to be continuously monitored. During the long shutdown between LHC Run 2 and Run 3, the CMS monitoring infrastructure has undergone major changes to increase the coverage of monitored applications and services, while becoming more sustainable and easier to operate and maintain. The used technologies are based on open-source solutions, either provided by the CERN IT department through the MONIT infrastructure, or managed by the CMS monitoring team. Monitoring applications for distributed workload management, submission infrastructure based on HTCondor, distributed data management, facilities have been ported from mostly custom-built applications to use common data flow and visualization services. Data are mostly stored in no-SQL databases and storage technologies such as ElasticSearch, VictoriaMetrics, InfluxDB and HDFS, and accessed either via programmatic APIs, Apache Spark or Sqoop jobs, or visualized preferentially using Grafana. Most CMS monitoring applications are deployed on Kubernetes clusters to minimize maintenance operations. In this contribution we present the full stack of CMS monitoring services and show how we leveraged the use of common technologies to cover a variety of monitoring applications and cope with the computing challenges of LHC Run 3.
id cern-2869308
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28693082023-09-04T20:30:53Zhttp://cds.cern.ch/record/2869308engLegger, FedericaThe CMS monitoring applications for LHC Run 3Detectors and Experimental TechniquesData taking at the Large Hadron Collider (LHC) at CERN restarted in 2022. The CMS experiment relies on a distributed computing infrastructure based on WLCG (Worldwide LHC Computing Grid) to support the LHC Run 3 physics program. The CMS computing infrastructure is highly heterogeneous and relies on a set of centrally provided services, such as distributed workload management and data management, and computing resources hosted at almost 150 sites worldwide. Smooth data taking and processing requires all computing subsystems to be fully operational, and available computing and storage resources need to be continuously monitored. During the long shutdown between LHC Run 2 and Run 3, the CMS monitoring infrastructure has undergone major changes to increase the coverage of monitored applications and services, while becoming more sustainable and easier to operate and maintain. The used technologies are based on open-source solutions, either provided by the CERN IT department through the MONIT infrastructure, or managed by the CMS monitoring team. Monitoring applications for distributed workload management, submission infrastructure based on HTCondor, distributed data management, facilities have been ported from mostly custom-built applications to use common data flow and visualization services. Data are mostly stored in no-SQL databases and storage technologies such as ElasticSearch, VictoriaMetrics, InfluxDB and HDFS, and accessed either via programmatic APIs, Apache Spark or Sqoop jobs, or visualized preferentially using Grafana. Most CMS monitoring applications are deployed on Kubernetes clusters to minimize maintenance operations. In this contribution we present the full stack of CMS monitoring services and show how we leveraged the use of common technologies to cover a variety of monitoring applications and cope with the computing challenges of LHC Run 3.CMS-CR-2023-117oai:cds.cern.ch:28693082023-08-16
spellingShingle Detectors and Experimental Techniques
Legger, Federica
The CMS monitoring applications for LHC Run 3
title The CMS monitoring applications for LHC Run 3
title_full The CMS monitoring applications for LHC Run 3
title_fullStr The CMS monitoring applications for LHC Run 3
title_full_unstemmed The CMS monitoring applications for LHC Run 3
title_short The CMS monitoring applications for LHC Run 3
title_sort cms monitoring applications for lhc run 3
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/2869308
work_keys_str_mv AT leggerfederica thecmsmonitoringapplicationsforlhcrun3
AT leggerfederica cmsmonitoringapplicationsforlhcrun3