Cargando…

Grid site availability evaluation and monitoring at CMS

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provide...

Descripción completa

Detalles Bibliográficos
Autores principales: Lyons, Gaston, Maciulaitis, Rokas, Bagliesi, Giuseppe, Lammel, Stephan, Sciabà, Andrea
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/9/092014
http://cds.cern.ch/record/2296664
_version_ 1780956907943493632
author Lyons, Gaston
Maciulaitis, Rokas
Bagliesi, Giuseppe
Lammel, Stephan
Sciabà, Andrea
author_facet Lyons, Gaston
Maciulaitis, Rokas
Bagliesi, Giuseppe
Lammel, Stephan
Sciabà, Andrea
author_sort Lyons, Gaston
collection CERN
description The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impact data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.
id oai-inspirehep.net-1638611
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling oai-inspirehep.net-16386112021-02-09T10:06:08Zdoi:10.1088/1742-6596/898/9/092014http://cds.cern.ch/record/2296664engLyons, GastonMaciulaitis, RokasBagliesi, GiuseppeLammel, StephanSciabà, AndreaGrid site availability evaluation and monitoring at CMSComputing and ComputersThe Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impact data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.oai:inspirehep.net:16386112017
spellingShingle Computing and Computers
Lyons, Gaston
Maciulaitis, Rokas
Bagliesi, Giuseppe
Lammel, Stephan
Sciabà, Andrea
Grid site availability evaluation and monitoring at CMS
title Grid site availability evaluation and monitoring at CMS
title_full Grid site availability evaluation and monitoring at CMS
title_fullStr Grid site availability evaluation and monitoring at CMS
title_full_unstemmed Grid site availability evaluation and monitoring at CMS
title_short Grid site availability evaluation and monitoring at CMS
title_sort grid site availability evaluation and monitoring at cms
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/898/9/092014
http://cds.cern.ch/record/2296664
work_keys_str_mv AT lyonsgaston gridsiteavailabilityevaluationandmonitoringatcms
AT maciulaitisrokas gridsiteavailabilityevaluationandmonitoringatcms
AT bagliesigiuseppe gridsiteavailabilityevaluationandmonitoringatcms
AT lammelstephan gridsiteavailabilityevaluationandmonitoringatcms
AT sciabaandrea gridsiteavailabilityevaluationandmonitoringatcms