Cargando…

An Assessment of a Model for Error Processing in the CMS Data Acquisition System

The CMS Data Acquisition System consists of O(20000) interdependent services. A system providing exception and application-specific monitoring data is essential for the operation of such a cluster. Due to the number of involved services the amount of monitoring data is higher than a human operator c...

Descripción completa

Detalles Bibliográficos
Autores principales: Dustdar, Schahram, Gutleber, Johannes, Moser, Roland, Orsini, Luciano
Lenguaje:eng
Publicado: 2009
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/219/2/022039
http://cds.cern.ch/record/1196135
_version_ 1780917063308541952
author Dustdar, Schahram
Gutleber, Johannes
Moser, Roland
Orsini, Luciano
author_facet Dustdar, Schahram
Gutleber, Johannes
Moser, Roland
Orsini, Luciano
author_sort Dustdar, Schahram
collection CERN
description The CMS Data Acquisition System consists of O(20000) interdependent services. A system providing exception and application-specific monitoring data is essential for the operation of such a cluster. Due to the number of involved services the amount of monitoring data is higher than a human operator can handle efficiently. Thus moving the expert-knowledge for error analysis from the operator to a dedicated system is a natural choice. This reduces the number of notifications to the operator for simpler visualization and provides meaningful error cause descriptions and suggestions for possible countermeasures. This paper discusses an architecture of a workflow-based hierarchical error analysis system based on Guardians for the CMS Data Acquisition System. Guardians provide a common interface for error analysis of a specific service or subsystem. To provide effective and complete error analysis, the requirements regarding information sources, monitoring and configuration, are analyzed. Formats for common notification types are defined and a generic Guardian based on Event-Condition-Action rules is presented as a proof-of-concept.
id cern-1196135
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2009
record_format invenio
spelling cern-11961352019-09-30T06:29:59Zdoi:10.1088/1742-6596/219/2/022039http://cds.cern.ch/record/1196135engDustdar, SchahramGutleber, JohannesMoser, RolandOrsini, LucianoAn Assessment of a Model for Error Processing in the CMS Data Acquisition SystemDetectors and Experimental TechniquesThe CMS Data Acquisition System consists of O(20000) interdependent services. A system providing exception and application-specific monitoring data is essential for the operation of such a cluster. Due to the number of involved services the amount of monitoring data is higher than a human operator can handle efficiently. Thus moving the expert-knowledge for error analysis from the operator to a dedicated system is a natural choice. This reduces the number of notifications to the operator for simpler visualization and provides meaningful error cause descriptions and suggestions for possible countermeasures. This paper discusses an architecture of a workflow-based hierarchical error analysis system based on Guardians for the CMS Data Acquisition System. Guardians provide a common interface for error analysis of a specific service or subsystem. To provide effective and complete error analysis, the requirements regarding information sources, monitoring and configuration, are analyzed. Formats for common notification types are defined and a generic Guardian based on Event-Condition-Action rules is presented as a proof-of-concept.CMS-CR-2009-064oai:cds.cern.ch:11961352009-05-08
spellingShingle Detectors and Experimental Techniques
Dustdar, Schahram
Gutleber, Johannes
Moser, Roland
Orsini, Luciano
An Assessment of a Model for Error Processing in the CMS Data Acquisition System
title An Assessment of a Model for Error Processing in the CMS Data Acquisition System
title_full An Assessment of a Model for Error Processing in the CMS Data Acquisition System
title_fullStr An Assessment of a Model for Error Processing in the CMS Data Acquisition System
title_full_unstemmed An Assessment of a Model for Error Processing in the CMS Data Acquisition System
title_short An Assessment of a Model for Error Processing in the CMS Data Acquisition System
title_sort assessment of a model for error processing in the cms data acquisition system
topic Detectors and Experimental Techniques
url https://dx.doi.org/10.1088/1742-6596/219/2/022039
http://cds.cern.ch/record/1196135
work_keys_str_mv AT dustdarschahram anassessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT gutleberjohannes anassessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT moserroland anassessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT orsiniluciano anassessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT dustdarschahram assessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT gutleberjohannes assessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT moserroland assessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem
AT orsiniluciano assessmentofamodelforerrorprocessinginthecmsdataacquisitionsystem