Cargando…

Real-time complex event processing for cloud resources

The ongoing integration of clouds into the WLCG raises the need for detailed health and performance monitoring of the virtual resources in order to prevent problems of degraded service and interruptions due to undetected failures. When working in scale, the existing monitoring diversity can lead to...

Descripción completa

Detalles Bibliográficos
Autores principales: Adam, M, Cordeiro, C, Field, L, Giordano, D, Magnoni, L
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/4/042020
http://cds.cern.ch/record/2297282
Descripción
Sumario:The ongoing integration of clouds into the WLCG raises the need for detailed health and performance monitoring of the virtual resources in order to prevent problems of degraded service and interruptions due to undetected failures. When working in scale, the existing monitoring diversity can lead to a metric overflow whereby the operators need to manually collect and correlate data from several monitoring tools and frameworks, resulting in tens of different metrics to be constantly interpreted and analyzed per virtual machine. In this paper we present an ESPER based standalone application which is able to process complex monitoring events coming from various sources and automatically interpret data in order to issue alarms upon the resources’ statuses, without interfering with the actual resources and data sources. We will describe how this application has been used with both commercial and non-commercial cloud activities, allowing the operators to quickly be alarmed and react to misbehaving VMs and LHC experiments’ workflows. We will present the pattern analysis mechanisms being used, as well as the surrounding Elastic and REST API interfaces where the alarms are collected and served to users.