Cargando…

Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud

<!--HTML-->Frequent validation and stress testing of the network, storage and CPU resources of a grid site is essential to achieve high performance and reliability. HammerCloud was previously introduced with the goals of enabling VO- and site-administrators to run such tests in an automated or...

Descripción completa

Detalles Bibliográficos
Autor principal: Van Der Ster, Daniel Colin
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1460677
_version_ 1780925253365530624
author Van Der Ster, Daniel Colin
author_facet Van Der Ster, Daniel Colin
author_sort Van Der Ster, Daniel Colin
collection CERN
description <!--HTML-->Frequent validation and stress testing of the network, storage and CPU resources of a grid site is essential to achieve high performance and reliability. HammerCloud was previously introduced with the goals of enabling VO- and site-administrators to run such tests in an automated or on-demand manner. The ATLAS, CMS and LHCb experiments have all developed VO plugins for the service and have successfully integrated it into their grid operations infrastructures. This work will present the experience in running HammerCloud at full scale for more than 3 years and present solutions to the scalability issues faced by the service. First, we will show the particular challenges faced when integrating with CMS and LHCb offline computing, including customized dashboards to show site validation reports for the VOs and a new API to tightly integrate with the LHCbDIRAC Resource Status System. Next, a study of the automatic site exclusion component used by ATLAS will be presented along with results for tuning the exclusion policies. A study of the historical test results for ATLAS, CMS and LHCb will be presented, including comparisons between the experiments' grid availabilities and a search for site-based or temporal failure correlations. Finally, we will look to future plans that will allow users to gain new insights into the test results; these include developments to allow increased testing concurrency, increased scale in the number of metrics recorded per test job (up to hundreds), and increased scale in the historical job information (up to many millions of jobs per VO).
id cern-1460677
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14606772022-11-02T22:23:34Zhttp://cds.cern.ch/record/1460677engVan Der Ster, Daniel ColinExperience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloudComputing in High Energy and Nuclear Physics (CHEP) 2012Conferences<!--HTML-->Frequent validation and stress testing of the network, storage and CPU resources of a grid site is essential to achieve high performance and reliability. HammerCloud was previously introduced with the goals of enabling VO- and site-administrators to run such tests in an automated or on-demand manner. The ATLAS, CMS and LHCb experiments have all developed VO plugins for the service and have successfully integrated it into their grid operations infrastructures. This work will present the experience in running HammerCloud at full scale for more than 3 years and present solutions to the scalability issues faced by the service. First, we will show the particular challenges faced when integrating with CMS and LHCb offline computing, including customized dashboards to show site validation reports for the VOs and a new API to tightly integrate with the LHCbDIRAC Resource Status System. Next, a study of the automatic site exclusion component used by ATLAS will be presented along with results for tuning the exclusion policies. A study of the historical test results for ATLAS, CMS and LHCb will be presented, including comparisons between the experiments' grid availabilities and a search for site-based or temporal failure correlations. Finally, we will look to future plans that will allow users to gain new insights into the test results; these include developments to allow increased testing concurrency, increased scale in the number of metrics recorded per test job (up to hundreds), and increased scale in the historical job information (up to many millions of jobs per VO).oai:cds.cern.ch:14606772012
spellingShingle Conferences
Van Der Ster, Daniel Colin
Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_full Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_fullStr Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_full_unstemmed Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_short Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_sort experience in grid site testing for atlas, cms and lhcb with hammercloud
topic Conferences
url http://cds.cern.ch/record/1460677
work_keys_str_mv AT vandersterdanielcolin experienceingridsitetestingforatlascmsandlhcbwithhammercloud
AT vandersterdanielcolin computinginhighenergyandnuclearphysicschep2012