Cargando…

Grid reliability

We are offering a system to track the efficiency of different components of the GRID. We can study the performance of both the WMS and the data transfers At the moment, we have set different parts of the system for ALICE, ATLAS, CMS and LHCb. None of the components that we have developed are VO spec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saiz, P, Gaidioz, B, Rocha, R, Andreeva, J
Lenguaje:	eng
Publicado:	2007
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1120924

_version_	1780914569192931328
author	Saiz, P Gaidioz, B Rocha, R Andreeva, J
author_facet	Saiz, P Gaidioz, B Rocha, R Andreeva, J
author_sort	Saiz, P
collection	CERN
description	We are offering a system to track the efficiency of different components of the GRID. We can study the performance of both the WMS and the data transfers At the moment, we have set different parts of the system for ALICE, ATLAS, CMS and LHCb. None of the components that we have developed are VO specific, therefore it would be very easy to deploy them for any other VO. Our main goal is basically to improve the reliability of the GRID. The main idea is to discover as soon as possible the different problems that have happened, and inform the responsible. Since we study the jobs and transfers issued by real users, we see the same problems that users see. As a matter of fact, we see even more problems than the end user does, since we are also interested in following up the errors that GRID components can overcome by themselves (like for instance, in case of a job failure, resubmitting the job to a different site). This kind of information is very useful to site and VO administrators. They can find out the efficiency of their sites, and, in case of failures, the problems that they have to solve. The reports that we provide are also interesting for the COD, since the errors might not be VO specific. All this system is based on studying the different actions that users do. Therefore, the first and most important dependency is on monitoring systems. The way we do it is to interface it with the DASHBOARD, which will hide the differences between the heterogeneous sources of data (like RGMA, ICXML or MonALISA). Another service very important for the effectiveness of the Grid reliability is the submission and tracking of tickets, GGUS. This has already been tested with a manual procedure. Since the result was very encouraging, we are working on ways of automatizing this interaction. The main problem that we have found so far is the lacking of communication between the new gLite RB and RGMA. Jobs that went through these resource brokers do not publish their status, thus making our tasks impossible. Another possible problem that we might encounter is the confidentiality of the data. To solve this, we are anonymising the jobs and transfers, since we are only interested in the different status that the job or transfer goes through.
id	cern-1120924
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2007
record_format	invenio
spelling	cern-11209242019-09-30T06:29:59Zhttp://cds.cern.ch/record/1120924engSaiz, PGaidioz, BRocha, RAndreeva, JGrid reliabilityComputing and ComputersWe are offering a system to track the efficiency of different components of the GRID. We can study the performance of both the WMS and the data transfers At the moment, we have set different parts of the system for ALICE, ATLAS, CMS and LHCb. None of the components that we have developed are VO specific, therefore it would be very easy to deploy them for any other VO. Our main goal is basically to improve the reliability of the GRID. The main idea is to discover as soon as possible the different problems that have happened, and inform the responsible. Since we study the jobs and transfers issued by real users, we see the same problems that users see. As a matter of fact, we see even more problems than the end user does, since we are also interested in following up the errors that GRID components can overcome by themselves (like for instance, in case of a job failure, resubmitting the job to a different site). This kind of information is very useful to site and VO administrators. They can find out the efficiency of their sites, and, in case of failures, the problems that they have to solve. The reports that we provide are also interesting for the COD, since the errors might not be VO specific. All this system is based on studying the different actions that users do. Therefore, the first and most important dependency is on monitoring systems. The way we do it is to interface it with the DASHBOARD, which will hide the differences between the heterogeneous sources of data (like RGMA, ICXML or MonALISA). Another service very important for the effectiveness of the Grid reliability is the submission and tracking of tickets, GGUS. This has already been tested with a manual procedure. Since the result was very encouraging, we are working on ways of automatizing this interaction. The main problem that we have found so far is the lacking of communication between the new gLite RB and RGMA. Jobs that went through these resource brokers do not publish their status, thus making our tasks impossible. Another possible problem that we might encounter is the confidentiality of the data. To solve this, we are anonymising the jobs and transfers, since we are only interested in the different status that the job or transfer goes through.oai:cds.cern.ch:11209242007
spellingShingle	Computing and Computers Saiz, P Gaidioz, B Rocha, R Andreeva, J Grid reliability
title	Grid reliability
title_full	Grid reliability
title_fullStr	Grid reliability
title_full_unstemmed	Grid reliability
title_short	Grid reliability
title_sort	grid reliability
topic	Computing and Computers
url	http://cds.cern.ch/record/1120924
work_keys_str_mv	AT saizp gridreliability AT gaidiozb gridreliability AT rochar gridreliability AT andreevaj gridreliability

Grid reliability

Ejemplares similares