Cargando…

Multi-threaded checksum computation for ATLAS high-performance storage software

ATLAS is one of the generic-purpose experiments observing hadron collisions at the LHC at CERN. Its trigger and data acquisition system (TDAQ) is responsible for selecting and transporting interesting physics events from the detector to permanent storage where the data are used for physics analysis....

Descripción completa

Detalles Bibliográficos
Autores principales: Le Goff, Fabrice, Avolio, Giuseppe
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/1525/1/012026
http://cds.cern.ch/record/2673802
_version_ 1780962513042538496
author Le Goff, Fabrice
Avolio, Giuseppe
author_facet Le Goff, Fabrice
Avolio, Giuseppe
author_sort Le Goff, Fabrice
collection CERN
description ATLAS is one of the generic-purpose experiments observing hadron collisions at the LHC at CERN. Its trigger and data acquisition system (TDAQ) is responsible for selecting and transporting interesting physics events from the detector to permanent storage where the data are used for physics analysis. The transient storage of ATLAS TDAQ is the last component of the online system in the data flow. It records selected events at several GB/s to non-volatile storage before transfer to offline permanent storage where physics analysis are undertaken. The transient storage is a distributed system consisting of high-performance direct-attached storage servers accounting for 480 hard drives. A distributed multi-threaded C++ application operates the hardware. The transient storage is also responsible for computing a checksum for the data, which is used to ensure data integrity up to the physics analysis. Reliability and efficiency of this system are critical for the operations of TDAQ as well as the validity of the analysis. This paper presents the existing multi-threading strategy of the software and how the available hardware resources are used. We then introduce how multi-threaded checksum computation was introduced to increase significantly the maximum throughput of the system. We discuss the key concepts of the implementation with a focus on the importance of overhead minimization. Finally the paper reports on the tests done on the production system to demonstrate the validity of the implementation and measurements of the performance improvement in the view of future LHC and ATLAS upgrades.
id cern-2673802
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-26738022022-01-14T14:54:47Zdoi:10.1088/1742-6596/1525/1/012026http://cds.cern.ch/record/2673802engLe Goff, FabriceAvolio, GiuseppeMulti-threaded checksum computation for ATLAS high-performance storage softwareParticle Physics - ExperimentATLAS is one of the generic-purpose experiments observing hadron collisions at the LHC at CERN. Its trigger and data acquisition system (TDAQ) is responsible for selecting and transporting interesting physics events from the detector to permanent storage where the data are used for physics analysis. The transient storage of ATLAS TDAQ is the last component of the online system in the data flow. It records selected events at several GB/s to non-volatile storage before transfer to offline permanent storage where physics analysis are undertaken. The transient storage is a distributed system consisting of high-performance direct-attached storage servers accounting for 480 hard drives. A distributed multi-threaded C++ application operates the hardware. The transient storage is also responsible for computing a checksum for the data, which is used to ensure data integrity up to the physics analysis. Reliability and efficiency of this system are critical for the operations of TDAQ as well as the validity of the analysis. This paper presents the existing multi-threading strategy of the software and how the available hardware resources are used. We then introduce how multi-threaded checksum computation was introduced to increase significantly the maximum throughput of the system. We discuss the key concepts of the implementation with a focus on the importance of overhead minimization. Finally the paper reports on the tests done on the production system to demonstrate the validity of the implementation and measurements of the performance improvement in the view of future LHC and ATLAS upgrades.ATL-DAQ-PROC-2019-002oai:cds.cern.ch:26738022019-05-09
spellingShingle Particle Physics - Experiment
Le Goff, Fabrice
Avolio, Giuseppe
Multi-threaded checksum computation for ATLAS high-performance storage software
title Multi-threaded checksum computation for ATLAS high-performance storage software
title_full Multi-threaded checksum computation for ATLAS high-performance storage software
title_fullStr Multi-threaded checksum computation for ATLAS high-performance storage software
title_full_unstemmed Multi-threaded checksum computation for ATLAS high-performance storage software
title_short Multi-threaded checksum computation for ATLAS high-performance storage software
title_sort multi-threaded checksum computation for atlas high-performance storage software
topic Particle Physics - Experiment
url https://dx.doi.org/10.1088/1742-6596/1525/1/012026
http://cds.cern.ch/record/2673802
work_keys_str_mv AT legofffabrice multithreadedchecksumcomputationforatlashighperformancestoragesoftware
AT avoliogiuseppe multithreadedchecksumcomputationforatlashighperformancestoragesoftware