Cargando…
The BigPanDA self-monitoring alarm system for ATLAS
The BigPanDA monitoring system is a Web-application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution o...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2649752 |
_version_ | 1780960753477484544 |
---|---|
author | Alekseev, Aleksandr Korchuganova, Tatiana Padolski, Siarhei |
author_facet | Alekseev, Aleksandr Korchuganova, Tatiana Padolski, Siarhei |
author_sort | Alekseev, Aleksandr |
collection | CERN |
description | The BigPanDA monitoring system is a Web-application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution of the payload among available resources to issue tracking related to any of 350k jobs running simultaneously. It evolves intensively; in particular, in 2017, the system received 933 commits, delivering new features and expanding the scope of the presented data. The experience of operating BigPanDA in 24/7 mode led to development of a multilevel self-monitoring alarm system. This ELK-stack based solution covers all critical components of the BigPanda: from user authentication to management of the number of connections to the DB backend. The developed solution provides an intelligent error analysis, delivering to the operators only those notifications that need human intervention. We describe the architecture, principal features, and operation experience of self-monitoring, as well as its adaptation possibilities. |
id | cern-2649752 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2018 |
record_format | invenio |
spelling | cern-26497522019-09-30T06:29:59Zhttp://cds.cern.ch/record/2649752engAlekseev, AleksandrKorchuganova, TatianaPadolski, SiarheiThe BigPanDA self-monitoring alarm system for ATLASParticle Physics - ExperimentThe BigPanDA monitoring system is a Web-application created to deliver the real-time analytics, covering many aspects of the ATLAS experiment distributed computing. The system serves about 35000 requests daily and provides critical information used as input for various decisions: from distribution of the payload among available resources to issue tracking related to any of 350k jobs running simultaneously. It evolves intensively; in particular, in 2017, the system received 933 commits, delivering new features and expanding the scope of the presented data. The experience of operating BigPanDA in 24/7 mode led to development of a multilevel self-monitoring alarm system. This ELK-stack based solution covers all critical components of the BigPanda: from user authentication to management of the number of connections to the DB backend. The developed solution provides an intelligent error analysis, delivering to the operators only those notifications that need human intervention. We describe the architecture, principal features, and operation experience of self-monitoring, as well as its adaptation possibilities.ATL-SOFT-PROC-2018-054oai:cds.cern.ch:26497522018-12-03 |
spellingShingle | Particle Physics - Experiment Alekseev, Aleksandr Korchuganova, Tatiana Padolski, Siarhei The BigPanDA self-monitoring alarm system for ATLAS |
title | The BigPanDA self-monitoring alarm system for ATLAS |
title_full | The BigPanDA self-monitoring alarm system for ATLAS |
title_fullStr | The BigPanDA self-monitoring alarm system for ATLAS |
title_full_unstemmed | The BigPanDA self-monitoring alarm system for ATLAS |
title_short | The BigPanDA self-monitoring alarm system for ATLAS |
title_sort | bigpanda self-monitoring alarm system for atlas |
topic | Particle Physics - Experiment |
url | http://cds.cern.ch/record/2649752 |
work_keys_str_mv | AT alekseevaleksandr thebigpandaselfmonitoringalarmsystemforatlas AT korchuganovatatiana thebigpandaselfmonitoringalarmsystemforatlas AT padolskisiarhei thebigpandaselfmonitoringalarmsystemforatlas AT alekseevaleksandr bigpandaselfmonitoringalarmsystemforatlas AT korchuganovatatiana bigpandaselfmonitoringalarmsystemforatlas AT padolskisiarhei bigpandaselfmonitoringalarmsystemforatlas |