Cargando…

Strategies for experiment-specific monitoring in the Grid

This contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations. The LHC experiments perform most, if not all...

Descripción completa

Detalles Bibliográficos
Autores principales: Mendez Lorenzo, Patricia, Sciaba, Andrea, Campana, Simone, Santinelli, Roberto, Lanciotti, Elisa, Miccio, Enzo, Magini, Nicolo, Di Girolamo, Alessandro
Lenguaje:eng
Publicado: 2008
Materias:
Acceso en línea:http://cds.cern.ch/record/1123345
_version_ 1780914637463617536
author Mendez Lorenzo, Patricia
Sciaba, Andrea
Campana, Simone
Santinelli, Roberto
Lanciotti, Elisa
Miccio, Enzo
Magini, Nicolo
Di Girolamo, Alessandro
author_facet Mendez Lorenzo, Patricia
Sciaba, Andrea
Campana, Simone
Santinelli, Roberto
Lanciotti, Elisa
Miccio, Enzo
Magini, Nicolo
Di Girolamo, Alessandro
author_sort Mendez Lorenzo, Patricia
collection CERN
description This contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations. The LHC experiments perform most, if not all, of their computing activities on Grid resources. This requires an accurate and updated picture of the status of the Grid services used by them, and of the services which are specific to the experiment. To achieve this, a common method is to periodically execute tests on the services, where the functionalities tested may be different from a VO to another. The SAM framework, developed for the EGEE operations, can be easily used to run and publish the results of arbitrary tests, from basic functionality tests, to high-level operations from real production activities. This contribution describes in detail how the monitoring system of each LHC experiment has taken advantage of SAM. The work covered by this contribution has largely improved the usage efficiency of Grid resources by the LHC experiments. A more accurate and prompt discovery of problems allows to fix them as soon as they appear, thus increasing the overall reliability of the Grid resources from the experiment point of view. This information also allows the experiment applications to make better decisions whenever they are given a choice of the resources to use, avoiding for example to send jobs to problematic or overloaded computing resources. The necessity to commission the computing resources available to the experiments before the start of the LHC data taking in 2008 requires a constant effort to improve the quality of the monitoring information. This is why the work described here is still ongoing and we foresee an increasing usage of the SAM framework by the experiments, both by expanding the current tests, and by adding new tests for services that are not yet tested with this methodology.
id cern-1123345
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2008
record_format invenio
spelling cern-11233452019-09-30T06:29:59Zhttp://cds.cern.ch/record/1123345engMendez Lorenzo, PatriciaSciaba, AndreaCampana, SimoneSantinelli, RobertoLanciotti, ElisaMiccio, EnzoMagini, NicoloDi Girolamo, AlessandroStrategies for experiment-specific monitoring in the GridComputing and ComputersThis contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations. The LHC experiments perform most, if not all, of their computing activities on Grid resources. This requires an accurate and updated picture of the status of the Grid services used by them, and of the services which are specific to the experiment. To achieve this, a common method is to periodically execute tests on the services, where the functionalities tested may be different from a VO to another. The SAM framework, developed for the EGEE operations, can be easily used to run and publish the results of arbitrary tests, from basic functionality tests, to high-level operations from real production activities. This contribution describes in detail how the monitoring system of each LHC experiment has taken advantage of SAM. The work covered by this contribution has largely improved the usage efficiency of Grid resources by the LHC experiments. A more accurate and prompt discovery of problems allows to fix them as soon as they appear, thus increasing the overall reliability of the Grid resources from the experiment point of view. This information also allows the experiment applications to make better decisions whenever they are given a choice of the resources to use, avoiding for example to send jobs to problematic or overloaded computing resources. The necessity to commission the computing resources available to the experiments before the start of the LHC data taking in 2008 requires a constant effort to improve the quality of the monitoring information. This is why the work described here is still ongoing and we foresee an increasing usage of the SAM framework by the experiments, both by expanding the current tests, and by adding new tests for services that are not yet tested with this methodology.oai:cds.cern.ch:11233452008
spellingShingle Computing and Computers
Mendez Lorenzo, Patricia
Sciaba, Andrea
Campana, Simone
Santinelli, Roberto
Lanciotti, Elisa
Miccio, Enzo
Magini, Nicolo
Di Girolamo, Alessandro
Strategies for experiment-specific monitoring in the Grid
title Strategies for experiment-specific monitoring in the Grid
title_full Strategies for experiment-specific monitoring in the Grid
title_fullStr Strategies for experiment-specific monitoring in the Grid
title_full_unstemmed Strategies for experiment-specific monitoring in the Grid
title_short Strategies for experiment-specific monitoring in the Grid
title_sort strategies for experiment-specific monitoring in the grid
topic Computing and Computers
url http://cds.cern.ch/record/1123345
work_keys_str_mv AT mendezlorenzopatricia strategiesforexperimentspecificmonitoringinthegrid
AT sciabaandrea strategiesforexperimentspecificmonitoringinthegrid
AT campanasimone strategiesforexperimentspecificmonitoringinthegrid
AT santinelliroberto strategiesforexperimentspecificmonitoringinthegrid
AT lanciottielisa strategiesforexperimentspecificmonitoringinthegrid
AT miccioenzo strategiesforexperimentspecificmonitoringinthegrid
AT magininicolo strategiesforexperimentspecificmonitoringinthegrid
AT digirolamoalessandro strategiesforexperimentspecificmonitoringinthegrid