Cargando…

Monitoring the LHCb Experiment Computing Infrastructure with NAGIOS

LHCb has a large and complex infrastructure consisting of thousands of servers and embedded computers, hundreds of network devices and a lot of common infrastructure services such as shared storage, login and time services, databases and many others. All aspects that are operatively critic are integ...

Descripción completa

Detalles Bibliográficos
Autores principales: Bonaccorsi, E, Neufeld, N
Lenguaje:eng
Publicado: 2009
Acceso en línea:http://cds.cern.ch/record/1215280
Descripción
Sumario:LHCb has a large and complex infrastructure consisting of thousands of servers and embedded computers, hundreds of network devices and a lot of common infrastructure services such as shared storage, login and time services, databases and many others. All aspects that are operatively critic are integrated into the standard Experiment Control System (ECS) based on PVSSII. This enables non-expert operators to do first-line reactions. As the lower level and in particular for monitoring the infrastructure, the Control System itself depends on a secondary infrastructure, whose monitoring is based on NAGIOS. We present the design and implementation of the fabric management based on NAGIOS. Care has been taken to complement rather than duplicate functionality available in the Experiment Control System.