Cargando…

Towards higher reliability of CMS Computing Facilities

The CMS experiment has adopted a computing system where resources are distributed worldwide in more than 50 sites. The operation of the system requires a stable and reliable behavior of the underlying infrastructure. CMS has established procedures to extensively test all relevant aspects of a site a...

Descripción completa

Detalles Bibliográficos
Autor principal: Flix Molina, Jose
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1458462
_version_ 1780925157896880128
author Flix Molina, Jose
author_facet Flix Molina, Jose
author_sort Flix Molina, Jose
collection CERN
description The CMS experiment has adopted a computing system where resources are distributed worldwide in more than 50 sites. The operation of the system requires a stable and reliable behavior of the underlying infrastructure. CMS has established procedures to extensively test all relevant aspects of a site and their capability to sustain the various CMS computing workflows at the required scale. The Site Readiness monitoring infrastructure has been instrumental in understanding how the system as a whole was improving towards LHC operations, measuring the reliability of sites when running CMS activities, and providing sites with the information they need to solve eventual problems. This paper reviews the complete automation of the Site Readiness program, with the description of monitoring tools and their inclusion into the Site Status Board (SSB), the performance checks, the use of tools like HammerCloud, and the impact in improving the overall reliability of the Grid from the point of view of the CMS computing system. Based on these results, CMS automatically excludes sites to conduct workflows, in order to maximize workflows efficiencies. The performance against these tests seen at the sites during the first years of LHC running will be as well reviewed.
id cern-1458462
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14584622019-09-30T06:29:59Zhttp://cds.cern.ch/record/1458462engFlix Molina, JoseTowards higher reliability of CMS Computing FacilitiesDetectors and Experimental TechniquesThe CMS experiment has adopted a computing system where resources are distributed worldwide in more than 50 sites. The operation of the system requires a stable and reliable behavior of the underlying infrastructure. CMS has established procedures to extensively test all relevant aspects of a site and their capability to sustain the various CMS computing workflows at the required scale. The Site Readiness monitoring infrastructure has been instrumental in understanding how the system as a whole was improving towards LHC operations, measuring the reliability of sites when running CMS activities, and providing sites with the information they need to solve eventual problems. This paper reviews the complete automation of the Site Readiness program, with the description of monitoring tools and their inclusion into the Site Status Board (SSB), the performance checks, the use of tools like HammerCloud, and the impact in improving the overall reliability of the Grid from the point of view of the CMS computing system. Based on these results, CMS automatically excludes sites to conduct workflows, in order to maximize workflows efficiencies. The performance against these tests seen at the sites during the first years of LHC running will be as well reviewed.CMS-CR-2012-083oai:cds.cern.ch:14584622012-05-11
spellingShingle Detectors and Experimental Techniques
Flix Molina, Jose
Towards higher reliability of CMS Computing Facilities
title Towards higher reliability of CMS Computing Facilities
title_full Towards higher reliability of CMS Computing Facilities
title_fullStr Towards higher reliability of CMS Computing Facilities
title_full_unstemmed Towards higher reliability of CMS Computing Facilities
title_short Towards higher reliability of CMS Computing Facilities
title_sort towards higher reliability of cms computing facilities
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1458462
work_keys_str_mv AT flixmolinajose towardshigherreliabilityofcmscomputingfacilities