Cargando…

Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework

Trigger and DAQ (Data AQuisition) System of the ATLAS experiment on LHC at CERN is a very complex distributed computing system, composed of O(10000) applications running on a farm of commodity CPUs. The system is being designed and developed by dozens of software engineers and physicists since end o...

Descripción completa

Detalles Bibliográficos
Autores principales: Kazarov, A, Corso Radu, A, Magnoni, L, Lehmann Miotto, G
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1455244
_version_ 1780925034081026048
author Kazarov, A
Corso Radu, A
Magnoni, L
Lehmann Miotto, G
author_facet Kazarov, A
Corso Radu, A
Magnoni, L
Lehmann Miotto, G
author_sort Kazarov, A
collection CERN
description Trigger and DAQ (Data AQuisition) System of the ATLAS experiment on LHC at CERN is a very complex distributed computing system, composed of O(10000) applications running on a farm of commodity CPUs. The system is being designed and developed by dozens of software engineers and physicists since end of 1990's and it will be maintained in operational mode during the lifetime of the experiment. The TDAQ system is controlled by the Controls framework, which includes a set of software components and tools used for system configuration, distributed processes handling, synchronization of Run Control state transitions etc. The huge flow of operational monitoring data produced is constantly monitored by operators and experts in order to detect problems or misbehaviour. Given the scale of the system and the rates of data to be analyzed, the automation of the Controls framework functionality in the areas of operational monitoring, system verification, error detection and recovery is a strong requirement. The paper describes requirements, technologies choice, high-level design and some implementation aspects of advanced Controls tools based on knowledge-base technologies. The main aim of these tools is to store and to reuse developers expertise and operational knowledge in order to help TDAQ operators to control the system with maximum efficiency during life time of the experiment.
id cern-1455244
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14552442019-09-30T06:29:59Zhttp://cds.cern.ch/record/1455244engKazarov, ACorso Radu, AMagnoni, LLehmann Miotto, GUse of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls frameworkDetectors and Experimental TechniquesTrigger and DAQ (Data AQuisition) System of the ATLAS experiment on LHC at CERN is a very complex distributed computing system, composed of O(10000) applications running on a farm of commodity CPUs. The system is being designed and developed by dozens of software engineers and physicists since end of 1990's and it will be maintained in operational mode during the lifetime of the experiment. The TDAQ system is controlled by the Controls framework, which includes a set of software components and tools used for system configuration, distributed processes handling, synchronization of Run Control state transitions etc. The huge flow of operational monitoring data produced is constantly monitored by operators and experts in order to detect problems or misbehaviour. Given the scale of the system and the rates of data to be analyzed, the automation of the Controls framework functionality in the areas of operational monitoring, system verification, error detection and recovery is a strong requirement. The paper describes requirements, technologies choice, high-level design and some implementation aspects of advanced Controls tools based on knowledge-base technologies. The main aim of these tools is to store and to reuse developers expertise and operational knowledge in order to help TDAQ operators to control the system with maximum efficiency during life time of the experiment.ATL-DAQ-SLIDE-2012-364oai:cds.cern.ch:14552442012-06-10
spellingShingle Detectors and Experimental Techniques
Kazarov, A
Corso Radu, A
Magnoni, L
Lehmann Miotto, G
Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework
title Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework
title_full Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework
title_fullStr Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework
title_full_unstemmed Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework
title_short Use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for ATLAS Trigger-DAQ Controls framework
title_sort use of expert system and data analysis technologies in automation of error detection, diagnosis and recovery for atlas trigger-daq controls framework
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1455244
work_keys_str_mv AT kazarova useofexpertsystemanddataanalysistechnologiesinautomationoferrordetectiondiagnosisandrecoveryforatlastriggerdaqcontrolsframework
AT corsoradua useofexpertsystemanddataanalysistechnologiesinautomationoferrordetectiondiagnosisandrecoveryforatlastriggerdaqcontrolsframework
AT magnonil useofexpertsystemanddataanalysistechnologiesinautomationoferrordetectiondiagnosisandrecoveryforatlastriggerdaqcontrolsframework
AT lehmannmiottog useofexpertsystemanddataanalysistechnologiesinautomationoferrordetectiondiagnosisandrecoveryforatlastriggerdaqcontrolsframework