Cargando…

Orthos, an alarm system for the ALICE DAQ operations

ALICE (A Large Ion Collider Experiment) is the heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). The DAQ (Data Acquisition System) facilities handle the data flow from the detectors electronics up to the mass st...

Descripción completa

Detalles Bibliográficos
Autores principales: Chapeland, Sylvain, Carena, Franco, Carena, Wisla, Chibante Barroso, Vasco, Costa, Filippo, Denes, Ervin, Divia, Roberto, Fuchs, Ulrich, Grigore, Alexandru, Simonetti, Giuseppe, Soos, Csaba, Telesca, Adriana, Vande Vyvre, Pierre, von Haller, Barthelemy
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/396/1/012013
http://cds.cern.ch/record/1565930
Descripción
Sumario:ALICE (A Large Ion Collider Experiment) is the heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). The DAQ (Data Acquisition System) facilities handle the data flow from the detectors electronics up to the mass storage. The DAQ system is based on a large farm of commodity hardware consisting of more than 600 devices (Linux PCs, storage, network switches), and controls hundreds of distributed hardware and software components interacting together. This paper presents Orthos, the alarm system used to detect, log, report, and follow-up abnormal situations on the DAQ machines at the experimental area. The main objective of this package is to integrate alarm detection and notification mechanisms with a full-featured issues tracker, in order to prioritize, assign, and fix system failures optimally. This tool relies on a database repository with a logic engine, SQL interfaces to inject or query metrics, and dynamic web pages for user interaction. We describe the system architecture, the technologies used for the implementation, and the integration with existing monitoring tools.