Cargando…

Minimizing CPU Utilization Requirements to Monitor an ATLAS Data Transfer System

The ATLAS experiment at LHC will use a PC-based read-out component called FELIX to connect its Front-End Electronics to the Data Acquisition System. FELIX translates proprietary Front-End protocols to Ethernet and vice versa. Currently, FELIX makes use of parallel multi-threading to achieve the data...

Descripción completa

Detalles Bibliográficos
Autores principales: Leventis, Georgios, Schumacher, Jorn, Donszelmann, Mark
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1748-0221/15/02/C02009
http://cds.cern.ch/record/2692163
Descripción
Sumario:The ATLAS experiment at LHC will use a PC-based read-out component called FELIX to connect its Front-End Electronics to the Data Acquisition System. FELIX translates proprietary Front-End protocols to Ethernet and vice versa. Currently, FELIX makes use of parallel multi-threading to achieve the data rate requirements. Being a non-redundant component of the critical infrastructure necessitates its monitoring. This includes, but is not limited to, package statistics, memory utilization, and data rate statistics. However, for these statistics to be of practical use, the parallel threads are required to intercommunicate. The FELIX monitoring implementation prior to this research utilized thread-safe queues to which data was pushed from the parallel threads. A central thread would extract and combine the queue contents. Enabling statistics would deteriorate the throughput by more than 500%. To minimize this performance hit to the greatest extent, we took advantage of the CPU’s microarchitecture features and reduced concurrency. The focus was on hardware supported atomic operations. These instructions guarantee that a value is only modified if no updates have occurred on that value since reading it. They are used to complement and/or replace parallel computing lock mechanisms. The aforementioned queue system was replaced with sets of C/C++ atomic variables and corresponding atomic functions, hereinafter referred to as atomics. Three implementations were measured. Implementation I had one set of atomic variables being accessed from all the parallel threads. Implementation II had a set of atomic variables for every thread. These sets were accumulated by a central thread. Implementation III was the same as implementation B but appropriate measures were taken to eliminate any concurrency implications. The compiler used during the measurements was GCC which partially supports the hardware (microarchitecture) optimizations for atomics. Implementations I and II resulted in negligible differences compared to the original one. The gains were not consistent and less what is needed. Some benchmarks even showed deterioration of the performance. Implementation III (concurrency & cache optimized) yielded results with a performance improvement of up to 625% compared to the initial implementation. The data rate target was reached. Implementations similar to impementation III in our research could benefit similar environments. The results presented demonstrate that atomics can be useful for efficient computations in a multi-threaded environment. However, from the results, it is clear that concurrency, the system architecture and cache hierarchy needs to be taken into account in this programming model. The paper details the challenges of atomics and how they were overcome in the implementation of the FELIX monitoring system.