Cargando…

Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework

Trigger and DAQ (Data AQuisition) System of the ATLAS experiment on LHC at CERN is a very complex distributed computing system, composed of O(10000) applications running on a farm of commodity CPUs. The system is being designed and developed by dozens of software engineers and physicists since end o...

Descripción completa

Detalles Bibliográficos
Autores principales: Kazarov, A, Corso Radu, A, Magnoni, L, Lehmann Miotto, G
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1455466
Descripción
Sumario:Trigger and DAQ (Data AQuisition) System of the ATLAS experiment on LHC at CERN is a very complex distributed computing system, composed of O(10000) applications running on a farm of commodity CPUs. The system is being designed and developed by dozens of software engineers and physicists since end of 1990's and it will be maintained in operational mode during the lifetime of the experiment. The TDAQ system is controlled by the Controls framework, which includes a set of software components and tools used for system configuration, distributed processes handling, synchronization of Run Control state transitions etc. The huge flow of operational monitoring data produced is constantly monitored by operators and experts in order to detect problems or misbehaviour. Given the scale of the system and the rates of data to be analyzed, the automation of the Controls framework functionality in the areas of operational monitoring, system verification, error detection and recovery is a strong requirement. The paper describes requirements, technologies choice, high-level design and some implementation aspects of advanced Controls tools based on knowledge-base technologies. The main aim of these tools is to store and to reuse developers expertise and operational knowledge in order to help TDAQ operators to control the system with maximum efficiency during life time of the experiment.