Cargando…

The BigPanDA Monitoring System Architecture

Currently-running large-scale scientific projects involve unprecedented amounts of data and computing power. For example, the ATLAS experiment at the Large Hadron Collider (LHC) has collected 140 PB of data over the course of Run 1 and this value increases at rate of ~800MB/s during the ongoing Run...

Descripción completa

Detalles Bibliográficos
Autores principales: Korchuganova, Tatiana, Padolski, Siarhei, Wenaus, Torre, Klimentov, Alexei, Alekseev, Aleksandr
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2637585
Descripción
Sumario:Currently-running large-scale scientific projects involve unprecedented amounts of data and computing power. For example, the ATLAS experiment at the Large Hadron Collider (LHC) has collected 140 PB of data over the course of Run 1 and this value increases at rate of ~800MB/s during the ongoing Run 2. Processing and analysis of such amounts of data demands development of complex operational workflow and payload systems along with building top edge computing facilities. In the ATLAS experiment a key element of the payload management is the Production and Distributed Analysis system (PanDA). It consists of several core components and one of them is the monitoring. The latter is responsible for providing a comprehensive and coherent view of the tasks and jobs executed by the system, from high level summaries to detailed drill-down job diagnostics. The BigPanDA monitoring has been in production since the middle of 2014 and it continuously evolves to satisfy increasing demands in functionality and growing payload scales. Today it effectively keeps track of more than 2 million jobs per day distributed over 170 computing centers worldwide in the largest instance of the BigPanDA monitoring: the ATLAS experiment. In this paper we describe the monitoring architecture and its principal features.