Cargando…

Integrated System for Performance Monitoring of ATLAS TDAQ Network

The ATLAS TDAQ Network consists of three separate networks spanning four levels of the experimental building. Over 200 edge switches and 5 multi-blade chassis routers are used to interconnect 2000 processors, adding up to more than 7000 high speed interfaces. In order to substantially speed-up ad-ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Savu, D, Al-Shabibi, A, Martin, B, Sjoen, R, Batraneanu, S, Stancu, S
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:http://cds.cern.ch/record/1300536
Descripción
Sumario:The ATLAS TDAQ Network consists of three separate networks spanning four levels of the experimental building. Over 200 edge switches and 5 multi-blade chassis routers are used to interconnect 2000 processors, adding up to more than 7000 high speed interfaces. In order to substantially speed-up ad-hoc and post mortem analysis, a scalable, yet flexible, integrated system for monitoring both network statistics and environmental conditions, processor parameters and data taking characteristics was required. For successful up-to-the-minute monitoring, information from many SNMP compliant devices, independent databases and custom APIs was gathered, stored and displayed in an optimal way. Easy navigation and compact aggregation of multiple data sources were the main requirements; characteristics not found in any of the tested products, either open-source or commercial. This paper describes how performance, scalability and display issues were addressed and what challenges the project faced during development and deployment. A full set of modules, including a fast polling SNMP engine, user interfaces using latest web technologies and caching mechanisms, has been designed and developed from scratch. Over the last year the system proved to be stable and reliable, replacing the previous performance monitoring system and extending its capabilities. Currently it is operated using a precision interval of 25 seconds (the industry standard is 30 0 seconds). Although it was developed in order to address the needs for integrated performance monitoring of the ATLAS TDAQ network, the package can be used for monitoring any network with rigid demands of precision and scalability, exceeding normal industry standards.