Cargando…

Non-intrusive Quality Analysis of Monitoring Data

Any large-scale operational system running over a variety of devices requires a monitoring mechanism to assess the health of the overall system. The Technical Infrastructure Monitoring System (TIM) at CERN is one such system, and monitors a wide variety of devices and their properties, such as elect...

Descripción completa

Detalles Bibliográficos
Autores principales: Brightwell, M, Ailamaki, Anastasia, Suwalska, Anna
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:https://dx.doi.org/10.1007/978-3-642-13818-8_20
http://cds.cern.ch/record/1359261
_version_ 1780922616153899008
author Brightwell, M
Ailamaki, Anastasia
Suwalska, Anna
author_facet Brightwell, M
Ailamaki, Anastasia
Suwalska, Anna
author_sort Brightwell, M
collection CERN
description Any large-scale operational system running over a variety of devices requires a monitoring mechanism to assess the health of the overall system. The Technical Infrastructure Monitoring System (TIM) at CERN is one such system, and monitors a wide variety of devices and their properties, such as electricity supplies, device temperatures, liquid flows etc. Without adequate quality assurance, the data collected from such devices leads to false-positives and false-negatives, reducing the effectiveness of the monitoring system. The quality must, however, be measured in a non-intrusive way, so that the critical path of the data flow is not affected by the quality computation. The quality computation should also scale to large volumes of incoming data. To address these challenges, we develop a new statistical module, which monitors the data collected by TIM and reports its quality to the operators. The statistical module uses Oracle RDBMS as the underlying store, and builds hierarchical summaries on the basic events to scale to the volume of data. It has built-in fault-tolerance capability to recover from multiple computation failures. In this paper, we describe the design of the statistical module, and its usefulness for all parties involved with TIM: the system administrators, the operators using the system to monitor the devices, and the engineers responsible for attaching them to the system. We present concrete examples of how the software module helped with the monitoring, configuration and design of TIM since its introduction last year.
id cern-1359261
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2010
record_format invenio
spelling cern-13592612019-09-30T06:29:59Zdoi:10.1007/978-3-642-13818-8_20http://cds.cern.ch/record/1359261engBrightwell, MAilamaki, AnastasiaSuwalska, AnnaNon-intrusive Quality Analysis of Monitoring DataEngineeringAny large-scale operational system running over a variety of devices requires a monitoring mechanism to assess the health of the overall system. The Technical Infrastructure Monitoring System (TIM) at CERN is one such system, and monitors a wide variety of devices and their properties, such as electricity supplies, device temperatures, liquid flows etc. Without adequate quality assurance, the data collected from such devices leads to false-positives and false-negatives, reducing the effectiveness of the monitoring system. The quality must, however, be measured in a non-intrusive way, so that the critical path of the data flow is not affected by the quality computation. The quality computation should also scale to large volumes of incoming data. To address these challenges, we develop a new statistical module, which monitors the data collected by TIM and reports its quality to the operators. The statistical module uses Oracle RDBMS as the underlying store, and builds hierarchical summaries on the basic events to scale to the volume of data. It has built-in fault-tolerance capability to recover from multiple computation failures. In this paper, we describe the design of the statistical module, and its usefulness for all parties involved with TIM: the system administrators, the operators using the system to monitor the devices, and the engineers responsible for attaching them to the system. We present concrete examples of how the software module helped with the monitoring, configuration and design of TIM since its introduction last year.oai:cds.cern.ch:13592612010
spellingShingle Engineering
Brightwell, M
Ailamaki, Anastasia
Suwalska, Anna
Non-intrusive Quality Analysis of Monitoring Data
title Non-intrusive Quality Analysis of Monitoring Data
title_full Non-intrusive Quality Analysis of Monitoring Data
title_fullStr Non-intrusive Quality Analysis of Monitoring Data
title_full_unstemmed Non-intrusive Quality Analysis of Monitoring Data
title_short Non-intrusive Quality Analysis of Monitoring Data
title_sort non-intrusive quality analysis of monitoring data
topic Engineering
url https://dx.doi.org/10.1007/978-3-642-13818-8_20
http://cds.cern.ch/record/1359261
work_keys_str_mv AT brightwellm nonintrusivequalityanalysisofmonitoringdata
AT ailamakianastasia nonintrusivequalityanalysisofmonitoringdata
AT suwalskaanna nonintrusivequalityanalysisofmonitoringdata