Cargando…
Multi-threaded checksum computation for ATLAS high-performance storage software
ATLAS is one of the generic-purpose experiments observing hadron collisions at the LHC at CERN. Its trigger and data acquisition system (TDAQ) is responsible for selecting and transporting interesting physics events from the detector to permanent storage where the data are used for physics analysis....
Autores principales: | , |
---|---|
Lenguaje: | eng |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/1525/1/012026 http://cds.cern.ch/record/2673802 |
_version_ | 1780962513042538496 |
---|---|
author | Le Goff, Fabrice Avolio, Giuseppe |
author_facet | Le Goff, Fabrice Avolio, Giuseppe |
author_sort | Le Goff, Fabrice |
collection | CERN |
description | ATLAS is one of the generic-purpose experiments observing hadron collisions at the LHC at CERN. Its trigger and data acquisition system (TDAQ) is responsible for selecting and transporting interesting physics events from the detector to permanent storage where the data are used for physics analysis. The transient storage of ATLAS TDAQ is the last component of the online system in the data flow. It records selected events at several GB/s to non-volatile storage before transfer to offline permanent storage where physics analysis are undertaken. The transient storage is a distributed system consisting of high-performance direct-attached storage servers accounting for 480 hard drives. A distributed multi-threaded C++ application operates the hardware. The transient storage is also responsible for computing a checksum for the data, which is used to ensure data integrity up to the physics analysis. Reliability and efficiency of this system are critical for the operations of TDAQ as well as the validity of the analysis. This paper presents the existing multi-threading strategy of the software and how the available hardware resources are used. We then introduce how multi-threaded checksum computation was introduced to increase significantly the maximum throughput of the system. We discuss the key concepts of the implementation with a focus on the importance of overhead minimization. Finally the paper reports on the tests done on the production system to demonstrate the validity of the implementation and measurements of the performance improvement in the view of future LHC and ATLAS upgrades. |
id | cern-2673802 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2019 |
record_format | invenio |
spelling | cern-26738022022-01-14T14:54:47Zdoi:10.1088/1742-6596/1525/1/012026http://cds.cern.ch/record/2673802engLe Goff, FabriceAvolio, GiuseppeMulti-threaded checksum computation for ATLAS high-performance storage softwareParticle Physics - ExperimentATLAS is one of the generic-purpose experiments observing hadron collisions at the LHC at CERN. Its trigger and data acquisition system (TDAQ) is responsible for selecting and transporting interesting physics events from the detector to permanent storage where the data are used for physics analysis. The transient storage of ATLAS TDAQ is the last component of the online system in the data flow. It records selected events at several GB/s to non-volatile storage before transfer to offline permanent storage where physics analysis are undertaken. The transient storage is a distributed system consisting of high-performance direct-attached storage servers accounting for 480 hard drives. A distributed multi-threaded C++ application operates the hardware. The transient storage is also responsible for computing a checksum for the data, which is used to ensure data integrity up to the physics analysis. Reliability and efficiency of this system are critical for the operations of TDAQ as well as the validity of the analysis. This paper presents the existing multi-threading strategy of the software and how the available hardware resources are used. We then introduce how multi-threaded checksum computation was introduced to increase significantly the maximum throughput of the system. We discuss the key concepts of the implementation with a focus on the importance of overhead minimization. Finally the paper reports on the tests done on the production system to demonstrate the validity of the implementation and measurements of the performance improvement in the view of future LHC and ATLAS upgrades.ATL-DAQ-PROC-2019-002oai:cds.cern.ch:26738022019-05-09 |
spellingShingle | Particle Physics - Experiment Le Goff, Fabrice Avolio, Giuseppe Multi-threaded checksum computation for ATLAS high-performance storage software |
title | Multi-threaded checksum computation for ATLAS high-performance storage software |
title_full | Multi-threaded checksum computation for ATLAS high-performance storage software |
title_fullStr | Multi-threaded checksum computation for ATLAS high-performance storage software |
title_full_unstemmed | Multi-threaded checksum computation for ATLAS high-performance storage software |
title_short | Multi-threaded checksum computation for ATLAS high-performance storage software |
title_sort | multi-threaded checksum computation for atlas high-performance storage software |
topic | Particle Physics - Experiment |
url | https://dx.doi.org/10.1088/1742-6596/1525/1/012026 http://cds.cern.ch/record/2673802 |
work_keys_str_mv | AT legofffabrice multithreadedchecksumcomputationforatlashighperformancestoragesoftware AT avoliogiuseppe multithreadedchecksumcomputationforatlashighperformancestoragesoftware |