Cargando…

Service monitoring in the LHC experiments

The LHC experiments computing infrastructure is hosted in a distributed way across different computing centers in the Worldwide LHC Computing Grid (WLCG [1]) and needs to run with high reliability. It is therefore crucial to offer a unified view to shifters, who generally are not experts in the serv...

Descripción completa

Detalles Bibliográficos
Autores principales: Barreiro Megino, Fernando, Bernardoff, Vincent, da Silva Gomes, Diego, di Girolamo, Alessandro, Flix, Jos, Kreuzer, Peter, Roiser, Stefan
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/396/3/032010
http://cds.cern.ch/record/1565919
Descripción
Sumario:The LHC experiments computing infrastructure is hosted in a distributed way across different computing centers in the Worldwide LHC Computing Grid (WLCG [1]) and needs to run with high reliability. It is therefore crucial to offer a unified view to shifters, who generally are not experts in the services, and give them the ability to follow the status of resources and the health of critical systems in order to alert the experts whenever a system becomes unavailable. Several experiments have chosen to build their service monitoring on top of the flexible Service Level Status (SLS) framework developed by CERN IT. Based on examples from ATLAS, CMS and LHCb, this contribution will describe the complete development process of a service-monitoring instance and explain the deployment models that can be adopted.