Cargando…

Soft real-time alarm messages for ATLAS TDAQ

The ATLAS TDAQ network consists of three separate Ethernet-based networks (Data, Control and Management) with over 2000 end-nodes. The TDAQ system has to be aware of the meaningful network failures and events in order for it to take effective recovery actions. The first stage of the process is imple...

Descripción completa

Detalles Bibliográficos
Autores principales: Darlea, G, Al Shabibi, A, Martin, B, Lehmann Miotto, G
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:https://dx.doi.org/10.1016/j.nima.2009.06.067
http://cds.cern.ch/record/1290343
Descripción
Sumario:The ATLAS TDAQ network consists of three separate Ethernet-based networks (Data, Control and Management) with over 2000 end-nodes. The TDAQ system has to be aware of the meaningful network failures and events in order for it to take effective recovery actions. The first stage of the process is implemented with Spectrum, a commercial network management tool. Spectrum detects and registers all network events, then it publishes the information via a CORBA programming interface. A gateway program (called NSG—Network Service Gateway) connects to Spectrum through CORBA and exposes to its clients a Java RMI interface. This interface implements a callback mechanism that allows the clients to subscribe for monitoring "interesting" parts of the network. The last stage of the TDAQ network monitoring tool is implemented in a module named DNC (DAQ to Network Connection), which filters the events that are to be reported to the TDAQ system: it subscribes to the gateway only for the machines that are currently active in the system and it forwards only the alarms that are considered important for the current TDAQ data taking session. The network information is then synthesized and presented in a human-readable format. These messages can be further processed either by the shifter who is in charge, the network expert or the Online Expert System. This article aims to describe the different mechanisms of the chain that transports the network events to the front-end user, as well as the constraints and rules that govern the filtering and the final format of the alarm messages.