Cargando…
Strategies for experiment-specific monitoring in the Grid
This contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations. The LHC experiments perform most, if not all...
Autores principales: | , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2008
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1123345 |
_version_ | 1780914637463617536 |
---|---|
author | Mendez Lorenzo, Patricia Sciaba, Andrea Campana, Simone Santinelli, Roberto Lanciotti, Elisa Miccio, Enzo Magini, Nicolo Di Girolamo, Alessandro |
author_facet | Mendez Lorenzo, Patricia Sciaba, Andrea Campana, Simone Santinelli, Roberto Lanciotti, Elisa Miccio, Enzo Magini, Nicolo Di Girolamo, Alessandro |
author_sort | Mendez Lorenzo, Patricia |
collection | CERN |
description | This contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations. The LHC experiments perform most, if not all, of their computing activities on Grid resources. This requires an accurate and updated picture of the status of the Grid services used by them, and of the services which are specific to the experiment. To achieve this, a common method is to periodically execute tests on the services, where the functionalities tested may be different from a VO to another. The SAM framework, developed for the EGEE operations, can be easily used to run and publish the results of arbitrary tests, from basic functionality tests, to high-level operations from real production activities. This contribution describes in detail how the monitoring system of each LHC experiment has taken advantage of SAM. The work covered by this contribution has largely improved the usage efficiency of Grid resources by the LHC experiments. A more accurate and prompt discovery of problems allows to fix them as soon as they appear, thus increasing the overall reliability of the Grid resources from the experiment point of view. This information also allows the experiment applications to make better decisions whenever they are given a choice of the resources to use, avoiding for example to send jobs to problematic or overloaded computing resources. The necessity to commission the computing resources available to the experiments before the start of the LHC data taking in 2008 requires a constant effort to improve the quality of the monitoring information. This is why the work described here is still ongoing and we foresee an increasing usage of the SAM framework by the experiments, both by expanding the current tests, and by adding new tests for services that are not yet tested with this methodology. |
id | cern-1123345 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2008 |
record_format | invenio |
spelling | cern-11233452019-09-30T06:29:59Zhttp://cds.cern.ch/record/1123345engMendez Lorenzo, PatriciaSciaba, AndreaCampana, SimoneSantinelli, RobertoLanciotti, ElisaMiccio, EnzoMagini, NicoloDi Girolamo, AlessandroStrategies for experiment-specific monitoring in the GridComputing and ComputersThis contribution describes how the LHC experiments implement their own Grid resource monitoring, either by internally developed tools, or by reusing tools used for Grid operations, like the Service Availability Monitor (SAM) used for the EGEE operations. The LHC experiments perform most, if not all, of their computing activities on Grid resources. This requires an accurate and updated picture of the status of the Grid services used by them, and of the services which are specific to the experiment. To achieve this, a common method is to periodically execute tests on the services, where the functionalities tested may be different from a VO to another. The SAM framework, developed for the EGEE operations, can be easily used to run and publish the results of arbitrary tests, from basic functionality tests, to high-level operations from real production activities. This contribution describes in detail how the monitoring system of each LHC experiment has taken advantage of SAM. The work covered by this contribution has largely improved the usage efficiency of Grid resources by the LHC experiments. A more accurate and prompt discovery of problems allows to fix them as soon as they appear, thus increasing the overall reliability of the Grid resources from the experiment point of view. This information also allows the experiment applications to make better decisions whenever they are given a choice of the resources to use, avoiding for example to send jobs to problematic or overloaded computing resources. The necessity to commission the computing resources available to the experiments before the start of the LHC data taking in 2008 requires a constant effort to improve the quality of the monitoring information. This is why the work described here is still ongoing and we foresee an increasing usage of the SAM framework by the experiments, both by expanding the current tests, and by adding new tests for services that are not yet tested with this methodology.oai:cds.cern.ch:11233452008 |
spellingShingle | Computing and Computers Mendez Lorenzo, Patricia Sciaba, Andrea Campana, Simone Santinelli, Roberto Lanciotti, Elisa Miccio, Enzo Magini, Nicolo Di Girolamo, Alessandro Strategies for experiment-specific monitoring in the Grid |
title | Strategies for experiment-specific monitoring in the Grid |
title_full | Strategies for experiment-specific monitoring in the Grid |
title_fullStr | Strategies for experiment-specific monitoring in the Grid |
title_full_unstemmed | Strategies for experiment-specific monitoring in the Grid |
title_short | Strategies for experiment-specific monitoring in the Grid |
title_sort | strategies for experiment-specific monitoring in the grid |
topic | Computing and Computers |
url | http://cds.cern.ch/record/1123345 |
work_keys_str_mv | AT mendezlorenzopatricia strategiesforexperimentspecificmonitoringinthegrid AT sciabaandrea strategiesforexperimentspecificmonitoringinthegrid AT campanasimone strategiesforexperimentspecificmonitoringinthegrid AT santinelliroberto strategiesforexperimentspecificmonitoringinthegrid AT lanciottielisa strategiesforexperimentspecificmonitoringinthegrid AT miccioenzo strategiesforexperimentspecificmonitoringinthegrid AT magininicolo strategiesforexperimentspecificmonitoringinthegrid AT digirolamoalessandro strategiesforexperimentspecificmonitoringinthegrid |