Cargando…

Improving ATLAS grid site reliability with functional tests using HammerCloud

With the exponential growth of LHC (Large Hadron Collider) data in 2011, and more coming in 2012, distributed computing has become the established way to analyse collider data. The ATLAS grid infrastructure includes more than 80 sites worldwide, ranging from large national computing centers to small...

Descripción completa

Detalles Bibliográficos
Autores principales: Legger, F, Elmsheuser, J, Medrano Llamas, R, Sciacca, G, Van der Ster, D C
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1442601
_version_ 1780924694508077056
author Legger, F
Elmsheuser, J
Medrano Llamas, R
Sciacca, G
Van der Ster, D C
author_facet Legger, F
Elmsheuser, J
Medrano Llamas, R
Sciacca, G
Van der Ster, D C
author_sort Legger, F
collection CERN
description With the exponential growth of LHC (Large Hadron Collider) data in 2011, and more coming in 2012, distributed computing has become the established way to analyse collider data. The ATLAS grid infrastructure includes more than 80 sites worldwide, ranging from large national computing centers to smaller university clusters. These facilities are used for data reconstruction and simulation, which are centrally managed by the ATLAS production system, and for distributed user analysis. To ensure the smooth operation of such a complex system, regular tests of all sites are necessary to validate the site capability of successfully executing user and production jobs. We report on the development, optimization and results of an automated functional testing suite using the HammerCloud framework. Functional tests are short light-weight applications covering typical user analysis and production schemes, which are periodically submitted to all ATLAS grid sites. Results from those tests are collected and used to evaluate site performances. Sites that fail or are unable to run the tests are automatically excluded from the PanDA brokerage system, therefore avoiding user or production jobs to be sent to problematic sites.
id cern-1442601
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14426012019-09-30T06:29:59Zhttp://cds.cern.ch/record/1442601engLegger, FElmsheuser, JMedrano Llamas, RSciacca, GVan der Ster, D CImproving ATLAS grid site reliability with functional tests using HammerCloudDetectors and Experimental TechniquesWith the exponential growth of LHC (Large Hadron Collider) data in 2011, and more coming in 2012, distributed computing has become the established way to analyse collider data. The ATLAS grid infrastructure includes more than 80 sites worldwide, ranging from large national computing centers to smaller university clusters. These facilities are used for data reconstruction and simulation, which are centrally managed by the ATLAS production system, and for distributed user analysis. To ensure the smooth operation of such a complex system, regular tests of all sites are necessary to validate the site capability of successfully executing user and production jobs. We report on the development, optimization and results of an automated functional testing suite using the HammerCloud framework. Functional tests are short light-weight applications covering typical user analysis and production schemes, which are periodically submitted to all ATLAS grid sites. Results from those tests are collected and used to evaluate site performances. Sites that fail or are unable to run the tests are automatically excluded from the PanDA brokerage system, therefore avoiding user or production jobs to be sent to problematic sites.ATL-SOFT-SLIDE-2012-114oai:cds.cern.ch:14426012012-04-23
spellingShingle Detectors and Experimental Techniques
Legger, F
Elmsheuser, J
Medrano Llamas, R
Sciacca, G
Van der Ster, D C
Improving ATLAS grid site reliability with functional tests using HammerCloud
title Improving ATLAS grid site reliability with functional tests using HammerCloud
title_full Improving ATLAS grid site reliability with functional tests using HammerCloud
title_fullStr Improving ATLAS grid site reliability with functional tests using HammerCloud
title_full_unstemmed Improving ATLAS grid site reliability with functional tests using HammerCloud
title_short Improving ATLAS grid site reliability with functional tests using HammerCloud
title_sort improving atlas grid site reliability with functional tests using hammercloud
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1442601
work_keys_str_mv AT leggerf improvingatlasgridsitereliabilitywithfunctionaltestsusinghammercloud
AT elmsheuserj improvingatlasgridsitereliabilitywithfunctionaltestsusinghammercloud
AT medranollamasr improvingatlasgridsitereliabilitywithfunctionaltestsusinghammercloud
AT sciaccag improvingatlasgridsitereliabilitywithfunctionaltestsusinghammercloud
AT vandersterdc improvingatlasgridsitereliabilitywithfunctionaltestsusinghammercloud