Cargando…

Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution

Effective monitoring and analysis tools are fundamental in modern IT infrastructures to get insights on the overall system behavior and to deal promptly and effectively with failures. In recent years, Complex Event Processing (CEP) technologies have emerged as effective solutions for information pro...

Descripción completa

Detalles Bibliográficos
Autor principal: Magnoni, Luca
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1442916
_version_ 1780924700952625152
author Magnoni, Luca
author_facet Magnoni, Luca
author_sort Magnoni, Luca
collection CERN
description Effective monitoring and analysis tools are fundamental in modern IT infrastructures to get insights on the overall system behavior and to deal promptly and effectively with failures. In recent years, Complex Event Processing (CEP) technologies have emerged as effective solutions for information processing from the most disparate fields: from wireless sensor networks to financial analysis. This thesis proposes an innovative approach to monitor and operate complex and distributed computing systems, in particular referring to the ATLAS Trigger and Data Acquisition (TDAQ) system currently in use at the European Organization for Nuclear Research (CERN). The result of this research, the AAL project, is currently used to provide ATLAS data acquisition operators with automated error detection and intelligent system analysis. The thesis begins by describing the TDAQ system and the controlling architecture, with a focus on the monitoring infrastructure and the expert system used for error detection and automated recovery. It then discusses the limitations of the current approach and how it can be improved to maximize the ATLAS TDAQ operational efficiency. Event processing methodologies are then laid out, with a focus on CEP techniques for stream processing and pattern recognition. The open-source Esper engine, the CEP solution adopted by the project is subsequently analyzed and discussed. Next, the AAL project is introduced as the automated and intelligent monitoring solution developed as the result of this research. AAL requirements and governing factors are listed, with a focus on how stream processing functionalities can enhance the TDAQ monitoring experience. The AAL processing model is then introduced and the architectural choices are justified. Finally, real applications on TDAQ error detection are presented. The main conclusion from this work is that CEP techniques can be successfully applied to detect error conditions and system misbehavior. Moreover, the AAL project demonstrates a real application of CEP concepts for intelligent monitoring in the demanding TDAQ scenario. The adoption of AAL by several TDAQ communities shows that automation and intelligent system analysis were not properly addressed in the previous infrastructure. The results of this thesis will benefit researchers evaluating intelligent monitoring techniques on large-scale distributed computing system.
id cern-1442916
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14429162019-09-30T06:29:59Zhttp://cds.cern.ch/record/1442916engMagnoni, LucaIntelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solutionComputing and ComputersEffective monitoring and analysis tools are fundamental in modern IT infrastructures to get insights on the overall system behavior and to deal promptly and effectively with failures. In recent years, Complex Event Processing (CEP) technologies have emerged as effective solutions for information processing from the most disparate fields: from wireless sensor networks to financial analysis. This thesis proposes an innovative approach to monitor and operate complex and distributed computing systems, in particular referring to the ATLAS Trigger and Data Acquisition (TDAQ) system currently in use at the European Organization for Nuclear Research (CERN). The result of this research, the AAL project, is currently used to provide ATLAS data acquisition operators with automated error detection and intelligent system analysis. The thesis begins by describing the TDAQ system and the controlling architecture, with a focus on the monitoring infrastructure and the expert system used for error detection and automated recovery. It then discusses the limitations of the current approach and how it can be improved to maximize the ATLAS TDAQ operational efficiency. Event processing methodologies are then laid out, with a focus on CEP techniques for stream processing and pattern recognition. The open-source Esper engine, the CEP solution adopted by the project is subsequently analyzed and discussed. Next, the AAL project is introduced as the automated and intelligent monitoring solution developed as the result of this research. AAL requirements and governing factors are listed, with a focus on how stream processing functionalities can enhance the TDAQ monitoring experience. The AAL processing model is then introduced and the architectural choices are justified. Finally, real applications on TDAQ error detection are presented. The main conclusion from this work is that CEP techniques can be successfully applied to detect error conditions and system misbehavior. Moreover, the AAL project demonstrates a real application of CEP concepts for intelligent monitoring in the demanding TDAQ scenario. The adoption of AAL by several TDAQ communities shows that automation and intelligent system analysis were not properly addressed in the previous infrastructure. The results of this thesis will benefit researchers evaluating intelligent monitoring techniques on large-scale distributed computing system.CERN-THESIS-2012-039oai:cds.cern.ch:14429162012-04-23T15:56:13Z
spellingShingle Computing and Computers
Magnoni, Luca
Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution
title Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution
title_full Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution
title_fullStr Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution
title_full_unstemmed Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution
title_short Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution
title_sort intelligent monitoring and fault diagnosis for atlas tdaq: a complex event processing solution
topic Computing and Computers
url http://cds.cern.ch/record/1442916
work_keys_str_mv AT magnoniluca intelligentmonitoringandfaultdiagnosisforatlastdaqacomplexeventprocessingsolution