Cargando…

The BigPanDA self-monitoring alarm system for ATLAS

The BigPanDA monitoring system is a Web application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution o...

Descripción completa

Detalles Bibliográficos
Autores principales: Alekseev, Aleksandr, Korchuganova, Tatiana, Padolski, Siarhei
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2637639
_version_ 1780959942928236544
author Alekseev, Aleksandr
Korchuganova, Tatiana
Padolski, Siarhei
author_facet Alekseev, Aleksandr
Korchuganova, Tatiana
Padolski, Siarhei
author_sort Alekseev, Aleksandr
collection CERN
description The BigPanDA monitoring system is a Web application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution of the payload among available resources to issue tracking related to any of 350k jobs running simultaneously. It evolves intensively; in particular, in 2017, the system received 933 commits, delivering new features and expanding the scope of the presented data. The experience of operating BigPanDA in 24/7 mode led to development of a multilevel self-monitoring alarm system. This ELK-stack based solution covers all critical components of the BigPanda: from user authentication to management of the number of connections to the DB backend. The developed solution provides an intelligent error analysis, delivering to the operators only those notifications that need human intervention. We describe the architecture, principal features, and operation experience of self-monitoring, as well as its adaptation possibilities.
id cern-2637639
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling cern-26376392019-09-30T06:29:59Zhttp://cds.cern.ch/record/2637639engAlekseev, AleksandrKorchuganova, TatianaPadolski, SiarheiThe BigPanDA self-monitoring alarm system for ATLASParticle Physics - ExperimentThe BigPanDA monitoring system is a Web application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution of the payload among available resources to issue tracking related to any of 350k jobs running simultaneously. It evolves intensively; in particular, in 2017, the system received 933 commits, delivering new features and expanding the scope of the presented data. The experience of operating BigPanDA in 24/7 mode led to development of a multilevel self-monitoring alarm system. This ELK-stack based solution covers all critical components of the BigPanda: from user authentication to management of the number of connections to the DB backend. The developed solution provides an intelligent error analysis, delivering to the operators only those notifications that need human intervention. We describe the architecture, principal features, and operation experience of self-monitoring, as well as its adaptation possibilities.ATL-SOFT-SLIDE-2018-696oai:cds.cern.ch:26376392018-09-08
spellingShingle Particle Physics - Experiment
Alekseev, Aleksandr
Korchuganova, Tatiana
Padolski, Siarhei
The BigPanDA self-monitoring alarm system for ATLAS
title The BigPanDA self-monitoring alarm system for ATLAS
title_full The BigPanDA self-monitoring alarm system for ATLAS
title_fullStr The BigPanDA self-monitoring alarm system for ATLAS
title_full_unstemmed The BigPanDA self-monitoring alarm system for ATLAS
title_short The BigPanDA self-monitoring alarm system for ATLAS
title_sort bigpanda self-monitoring alarm system for atlas
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2637639
work_keys_str_mv AT alekseevaleksandr thebigpandaselfmonitoringalarmsystemforatlas
AT korchuganovatatiana thebigpandaselfmonitoringalarmsystemforatlas
AT padolskisiarhei thebigpandaselfmonitoringalarmsystemforatlas
AT alekseevaleksandr bigpandaselfmonitoringalarmsystemforatlas
AT korchuganovatatiana bigpandaselfmonitoringalarmsystemforatlas
AT padolskisiarhei bigpandaselfmonitoringalarmsystemforatlas