Cargando…

Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud

Frequent validation and stress testing of the network, storage and CPU resources of a grid site is essential to achieve high performance and reliability. HammerCloud was previously introduced with the goals of enabling VO- and site-administrators to run such tests in an automated or on-demand manner...

Descripción completa

Detalles Bibliográficos
Autores principales:	Van der Ster , D, Elmsheuser , J, Medrano Llamas, R, Legger , F, Sciaba, A, Sciacca, G, Ubeda Garca , M
Lenguaje:	eng
Publicado:	2012
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1457967

_version_	1780925147093401600
author	Van der Ster , D Elmsheuser , J Medrano Llamas, R Legger , F Sciaba, A Sciacca, G Ubeda Garca , M
author_facet	Van der Ster , D Elmsheuser , J Medrano Llamas, R Legger , F Sciaba, A Sciacca, G Ubeda Garca , M
author_sort	Van der Ster , D
collection	CERN
description	Frequent validation and stress testing of the network, storage and CPU resources of a grid site is essential to achieve high performance and reliability. HammerCloud was previously introduced with the goals of enabling VO- and site-administrators to run such tests in an automated or on-demand manner. The ATLAS, CMS and LHCb experiments have all developed VO plugins for the service and have successfully integrated it into their grid operations infrastructures. This work will present the experience in running HammerCloud at full scale for more than 3 years and present solutions to the scalability issues faced by the service. First, we will show the particular challenges faced when integrating with CMS and LHCb offline computing, including customized dashboards to show site validation reports for the VOs and a new API to tightly integrate with the LHCbDIRAC Resource Status System. Next, a study of the automatic site exclusion component used by ATLAS will be presented along with results for tuning the exclusion policies. A study of the historical test results for ATLAS, CMS and LHCb will be presented, including comparisons between the experiments' grid availabilities and a search for site-based or temporal failure correlations. Finally, we will look to future plans that will allow users to gain new insights into the test results; these include developments to allow increased testing concurrency, increased scale in the number of metrics recorded per test job (up to hundreds), and increased scale in the historical job information (up to many millions of jobs per VO).
id	cern-1457967
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2012
record_format	invenio
spelling	cern-14579672022-08-17T13:32:57Zhttp://cds.cern.ch/record/1457967engVan der Ster , DElmsheuser , JMedrano Llamas, RLegger , FSciaba, ASciacca, GUbeda Garca , MExperience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloudComputing and ComputersFrequent validation and stress testing of the network, storage and CPU resources of a grid site is essential to achieve high performance and reliability. HammerCloud was previously introduced with the goals of enabling VO- and site-administrators to run such tests in an automated or on-demand manner. The ATLAS, CMS and LHCb experiments have all developed VO plugins for the service and have successfully integrated it into their grid operations infrastructures. This work will present the experience in running HammerCloud at full scale for more than 3 years and present solutions to the scalability issues faced by the service. First, we will show the particular challenges faced when integrating with CMS and LHCb offline computing, including customized dashboards to show site validation reports for the VOs and a new API to tightly integrate with the LHCbDIRAC Resource Status System. Next, a study of the automatic site exclusion component used by ATLAS will be presented along with results for tuning the exclusion policies. A study of the historical test results for ATLAS, CMS and LHCb will be presented, including comparisons between the experiments' grid availabilities and a search for site-based or temporal failure correlations. Finally, we will look to future plans that will allow users to gain new insights into the test results; these include developments to allow increased testing concurrency, increased scale in the number of metrics recorded per test job (up to hundreds), and increased scale in the historical job information (up to many millions of jobs per VO).CERN-IT-Note-2012-009oai:cds.cern.ch:14579672012-05-16
spellingShingle	Computing and Computers Van der Ster , D Elmsheuser , J Medrano Llamas, R Legger , F Sciaba, A Sciacca, G Ubeda Garca , M Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title	Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_full	Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_fullStr	Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_full_unstemmed	Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_short	Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud
title_sort	experience in grid site testing for atlas, cms and lhcb with hammercloud
topic	Computing and Computers
url	http://cds.cern.ch/record/1457967
work_keys_str_mv	AT vandersterd experienceingridsitetestingforatlascmsandlhcbwithhammercloud AT elmsheuserj experienceingridsitetestingforatlascmsandlhcbwithhammercloud AT medranollamasr experienceingridsitetestingforatlascmsandlhcbwithhammercloud AT leggerf experienceingridsitetestingforatlascmsandlhcbwithhammercloud AT sciabaa experienceingridsitetestingforatlascmsandlhcbwithhammercloud AT sciaccag experienceingridsitetestingforatlascmsandlhcbwithhammercloud AT ubedagarcam experienceingridsitetestingforatlascmsandlhcbwithhammercloud

Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud

Ejemplares similares