Cargando…

Operation of the ATLAS distributed computing

We describe the central operation of the ATLAS distributed computing system. The majority of compute intensive activities within ATLAS are carried out on some 350,000 CPU cores on the Grid, augmented by opportunistic usage of significant HPC and volunteer resources. The increasing scale, and challen...

Descripción completa

Detalles Bibliográficos
Autores principales: Barreiro Megino, Fernando Harald, Cameron, David, Di Girolamo, Alessandro, Glushkov, Ivan, Filipcic, Andrej, Legger, Federica, Maeno, Tadashi, Walker, Rodney
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2626049
Descripción
Sumario:We describe the central operation of the ATLAS distributed computing system. The majority of compute intensive activities within ATLAS are carried out on some 350,000 CPU cores on the Grid, augmented by opportunistic usage of significant HPC and volunteer resources. The increasing scale, and challenging new payloads, demand fine-tuning of operational procedures together with timely developments of the production system. We describe several such developments, motivated directly from operational experience. Optimization of inefficient task requests, from both official production and users, is made possible by automatic detection of payload properties. User education, job shaping or preventative throttling help to increase the overall throughput of the available resources.