Cargando…
HammerCloud: A Stress Testing System for Distributed Analysis
Distributed analysis of LHC data is an I/O-intensive activity which places large demands on the internal network, storage, and local disks at remote computing facilities. Commissioning and maintaining a site to provide an efficient distributed analysis service is therefore a challenge which can be a...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2011
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1319378 |
_version_ | 1780921460241465344 |
---|---|
author | van der Ster, Daniel C Elmsheuser, Johannes Ubeda Garcia, Mario Paladin, Massimo |
author_facet | van der Ster, Daniel C Elmsheuser, Johannes Ubeda Garcia, Mario Paladin, Massimo |
author_sort | van der Ster, Daniel C |
collection | CERN |
description | Distributed analysis of LHC data is an I/O-intensive activity which places large demands on the internal network, storage, and local disks at remote computing facilities. Commissioning and maintaining a site to provide an efficient distributed analysis service is therefore a challenge which can be aided by tools to help evaluate a variety of infrastructure designs and configurations. HammerCloud (HC) is one such tool; it is a stress testing service which is used by central operations teams, regional coordinators, and local site admins to (a) submit arbitrary number of analysis jobs to a number of sites, (b) maintain at a steady-state a predefined number of jobs running at the sites under test, (c) produce web-based reports summarizing the efficiency and performance of the sites under test, and (d) present a web-interface for historical test results to both evaluate progress and compare sites. HC was built around the distributed analysis framework Ganga, exploiting its API for grid job management. HC has been employed by the ATLAS experiment for continuous testing of many sites worldwide, and also during large scale computing challenges such as STEP'09 and UAT'09, where the scale of the tests exceeded 10,000 concurrently running and 1,000,000 total jobs over multi-day periods. In addition, HC is being adopted by the CMS experiment; the plugin structure of HC allows the execution of CMS jobs using their official tool (CRAB). |
id | cern-1319378 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2011 |
record_format | invenio |
spelling | cern-13193782019-09-30T06:29:59Zhttp://cds.cern.ch/record/1319378engvan der Ster, Daniel CElmsheuser, JohannesUbeda Garcia, MarioPaladin, MassimoHammerCloud: A Stress Testing System for Distributed AnalysisComputing and ComputersDistributed analysis of LHC data is an I/O-intensive activity which places large demands on the internal network, storage, and local disks at remote computing facilities. Commissioning and maintaining a site to provide an efficient distributed analysis service is therefore a challenge which can be aided by tools to help evaluate a variety of infrastructure designs and configurations. HammerCloud (HC) is one such tool; it is a stress testing service which is used by central operations teams, regional coordinators, and local site admins to (a) submit arbitrary number of analysis jobs to a number of sites, (b) maintain at a steady-state a predefined number of jobs running at the sites under test, (c) produce web-based reports summarizing the efficiency and performance of the sites under test, and (d) present a web-interface for historical test results to both evaluate progress and compare sites. HC was built around the distributed analysis framework Ganga, exploiting its API for grid job management. HC has been employed by the ATLAS experiment for continuous testing of many sites worldwide, and also during large scale computing challenges such as STEP'09 and UAT'09, where the scale of the tests exceeded 10,000 concurrently running and 1,000,000 total jobs over multi-day periods. In addition, HC is being adopted by the CMS experiment; the plugin structure of HC allows the execution of CMS jobs using their official tool (CRAB).CERN-IT-2011-001oai:cds.cern.ch:13193782011-01-06 |
spellingShingle | Computing and Computers van der Ster, Daniel C Elmsheuser, Johannes Ubeda Garcia, Mario Paladin, Massimo HammerCloud: A Stress Testing System for Distributed Analysis |
title | HammerCloud: A Stress Testing System for Distributed Analysis |
title_full | HammerCloud: A Stress Testing System for Distributed Analysis |
title_fullStr | HammerCloud: A Stress Testing System for Distributed Analysis |
title_full_unstemmed | HammerCloud: A Stress Testing System for Distributed Analysis |
title_short | HammerCloud: A Stress Testing System for Distributed Analysis |
title_sort | hammercloud: a stress testing system for distributed analysis |
topic | Computing and Computers |
url | http://cds.cern.ch/record/1319378 |
work_keys_str_mv | AT vandersterdanielc hammercloudastresstestingsystemfordistributedanalysis AT elmsheuserjohannes hammercloudastresstestingsystemfordistributedanalysis AT ubedagarciamario hammercloudastresstestingsystemfordistributedanalysis AT paladinmassimo hammercloudastresstestingsystemfordistributedanalysis |