Cargando…

ATLAS Distributed Computing Automation

The ATLAS Experiment benefits from computing resources distributed worldwide at more than 100 WLCG sites. The ATLAS Grid sites provide over 100k CPU job slots, over 100 PB of storage space on disk or tape. Monitoring of status of such a complex infrastructure is essential. The ATLAS Grid infrastruct...

Descripción completa

Detalles Bibliográficos
Autores principales: Schovancova, J, Barreiro Megino, F H, Borrego, C, Campana, S, Di Girolamo, A, Elmsheuser, J, Hejbal, J, Kouba, T, Legger, F, Magradze, E, Medrano Llamas, R, Negri, G, Rinaldi, L, Sciacca, G, Serfon, C, Van Der Ster, D C
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1461231
Descripción
Sumario:The ATLAS Experiment benefits from computing resources distributed worldwide at more than 100 WLCG sites. The ATLAS Grid sites provide over 100k CPU job slots, over 100 PB of storage space on disk or tape. Monitoring of status of such a complex infrastructure is essential. The ATLAS Grid infrastructure is monitored 24/7 by two teams of shifters distributed world-wide, by the ATLAS Distributed Computing experts, and by site administrators. In this paper we summarize automation efforts performed within the ATLAS Distributed Computing team in order to reduce manpower costs and improve the reliability of the system. Different aspects of the automation process are described: from the ATLAS Grid site topology provided by the ATLAS Grid Information System, via automatic site testing by the HammerCloud, to automatic exclusion from production or analysis activities.