Cargando…

Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework

We are developing a general purpose monitoring system for the ALICE experiment, based on the MonALISA framework. MonALISA (Monitoring Agents using a Large Integrated Services Architecture) is a fully distributed system with no single point of failure that is able to collect, store monitoring informa...

Descripción completa

Detalles Bibliográficos
Autores principales: Cirstoiu, C, Grigoras, C, Betev, L, Saiz, P, Peters, A J, Muraru, A, Voicu, R, Legrand, I
Lenguaje:eng
Publicado: 2007
Materias:
Acceso en línea:http://cds.cern.ch/record/1120925
_version_ 1780914569409986560
author Cirstoiu, C
Grigoras, C
Betev, L
Saiz, P
Peters, A J
Muraru, A
Voicu, R
Legrand, I
author_facet Cirstoiu, C
Grigoras, C
Betev, L
Saiz, P
Peters, A J
Muraru, A
Voicu, R
Legrand, I
author_sort Cirstoiu, C
collection CERN
description We are developing a general purpose monitoring system for the ALICE experiment, based on the MonALISA framework. MonALISA (Monitoring Agents using a Large Integrated Services Architecture) is a fully distributed system with no single point of failure that is able to collect, store monitoring information and present it as significant perspectives and synthetic views on the status and the trends of the entire system. Furthermore, agents can use it for taking automated operational decisions. Monitoring information is gathered locally from all the components running in each site. The entire flow of information is aggregated on site level by a MonALISA service and then collected and presented in various forms by a central MonALISA Repository. Based on this information, other services take operational decisions such as alerts, triggers, service restarts and automatic production job or transfer submissions. The system monitors all the components: computer clusters (all major parameters of each computing node), jobs status and consumed resources (CPU, both in time and SpecInt2k units, memory, disk usage), jobs network traffic while reading/writing files with xrootd, services availability with details in case of failures (both AliEn and LCG services, proxies lifetime), storage monitoring with detailed information on number of files, available space, or staging and migrating operations, FTD/FTS transfers. The system is reliable and functional for more than two years, representing the main view towards the ALICE Grid. Our focus is now on using the monitoring information for the development of higher level services that can take more intelligent operational decisions.
id cern-1120925
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2007
record_format invenio
spelling cern-11209252019-09-30T06:29:59Zhttp://cds.cern.ch/record/1120925engCirstoiu, CGrigoras, CBetev, LSaiz, PPeters, A JMuraru, AVoicu, RLegrand, IMonitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA frameworkDetectors and Experimental TechniquesComputing and ComputersWe are developing a general purpose monitoring system for the ALICE experiment, based on the MonALISA framework. MonALISA (Monitoring Agents using a Large Integrated Services Architecture) is a fully distributed system with no single point of failure that is able to collect, store monitoring information and present it as significant perspectives and synthetic views on the status and the trends of the entire system. Furthermore, agents can use it for taking automated operational decisions. Monitoring information is gathered locally from all the components running in each site. The entire flow of information is aggregated on site level by a MonALISA service and then collected and presented in various forms by a central MonALISA Repository. Based on this information, other services take operational decisions such as alerts, triggers, service restarts and automatic production job or transfer submissions. The system monitors all the components: computer clusters (all major parameters of each computing node), jobs status and consumed resources (CPU, both in time and SpecInt2k units, memory, disk usage), jobs network traffic while reading/writing files with xrootd, services availability with details in case of failures (both AliEn and LCG services, proxies lifetime), storage monitoring with detailed information on number of files, available space, or staging and migrating operations, FTD/FTS transfers. The system is reliable and functional for more than two years, representing the main view towards the ALICE Grid. Our focus is now on using the monitoring information for the development of higher level services that can take more intelligent operational decisions.oai:cds.cern.ch:11209252007
spellingShingle Detectors and Experimental Techniques
Computing and Computers
Cirstoiu, C
Grigoras, C
Betev, L
Saiz, P
Peters, A J
Muraru, A
Voicu, R
Legrand, I
Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework
title Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework
title_full Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework
title_fullStr Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework
title_full_unstemmed Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework
title_short Monitoring, accounting and automated decision support for the ALICE experiment based on the MonALISA framework
title_sort monitoring, accounting and automated decision support for the alice experiment based on the monalisa framework
topic Detectors and Experimental Techniques
Computing and Computers
url http://cds.cern.ch/record/1120925
work_keys_str_mv AT cirstoiuc monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT grigorasc monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT betevl monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT saizp monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT petersaj monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT murarua monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT voicur monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework
AT legrandi monitoringaccountingandautomateddecisionsupportforthealiceexperimentbasedonthemonalisaframework