Cargando…

Automated agents for management and control of the ALICE Computing Grid

A complex software environment such as the ALICE Computing Grid infrastructure requires permanent control and management for the large set of services involved. Automating control procedures reduces the human interaction with the various components of the system and yields better availability of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Grigoras, C, Betev, L, Carminati, F, Legrand, I, Voicu, R
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/219/6/062050
http://cds.cern.ch/record/1270843
_version_ 1780920205450412032
author Grigoras, C
Betev, L
Carminati, F
Legrand, I
Voicu, R
author_facet Grigoras, C
Betev, L
Carminati, F
Legrand, I
Voicu, R
author_sort Grigoras, C
collection CERN
description A complex software environment such as the ALICE Computing Grid infrastructure requires permanent control and management for the large set of services involved. Automating control procedures reduces the human interaction with the various components of the system and yields better availability of the overall system. In this paper we will present how we used the MonALISA framework to gather, store and display the relevant metrics in the entire system from central and remote site services. We will also show the automatic local and global procedures that are triggered by the monitored values. Decision-taking agents are used to restart remote services, alert the operators in case of problems that cannot be automatically solved, submit production jobs, replicate and analyze raw data, resource load-balance and other control mechanisms that optimize the overall work flow and simplify day-to-day operations. Synthetic graphical views for all operational parameters, correlations, state of services and applications as well as the full history of all monitoring metrics are available for the ent ire system that now encompasses 85 sites all over the world, mo re than 14000 CPU cores and 10PB of storage.
id cern-1270843
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2010
record_format invenio
spelling cern-12708432022-08-17T13:25:17Zdoi:10.1088/1742-6596/219/6/062050http://cds.cern.ch/record/1270843engGrigoras, CBetev, LCarminati, FLegrand, IVoicu, RAutomated agents for management and control of the ALICE Computing GridComputing and ComputersA complex software environment such as the ALICE Computing Grid infrastructure requires permanent control and management for the large set of services involved. Automating control procedures reduces the human interaction with the various components of the system and yields better availability of the overall system. In this paper we will present how we used the MonALISA framework to gather, store and display the relevant metrics in the entire system from central and remote site services. We will also show the automatic local and global procedures that are triggered by the monitored values. Decision-taking agents are used to restart remote services, alert the operators in case of problems that cannot be automatically solved, submit production jobs, replicate and analyze raw data, resource load-balance and other control mechanisms that optimize the overall work flow and simplify day-to-day operations. Synthetic graphical views for all operational parameters, correlations, state of services and applications as well as the full history of all monitoring metrics are available for the ent ire system that now encompasses 85 sites all over the world, mo re than 14000 CPU cores and 10PB of storage.oai:cds.cern.ch:12708432010
spellingShingle Computing and Computers
Grigoras, C
Betev, L
Carminati, F
Legrand, I
Voicu, R
Automated agents for management and control of the ALICE Computing Grid
title Automated agents for management and control of the ALICE Computing Grid
title_full Automated agents for management and control of the ALICE Computing Grid
title_fullStr Automated agents for management and control of the ALICE Computing Grid
title_full_unstemmed Automated agents for management and control of the ALICE Computing Grid
title_short Automated agents for management and control of the ALICE Computing Grid
title_sort automated agents for management and control of the alice computing grid
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/219/6/062050
http://cds.cern.ch/record/1270843
work_keys_str_mv AT grigorasc automatedagentsformanagementandcontrolofthealicecomputinggrid
AT betevl automatedagentsformanagementandcontrolofthealicecomputinggrid
AT carminatif automatedagentsformanagementandcontrolofthealicecomputinggrid
AT legrandi automatedagentsformanagementandcontrolofthealicecomputinggrid
AT voicur automatedagentsformanagementandcontrolofthealicecomputinggrid