Cargando…

Automated load balancing in the ATLAS high-performance storage software

The ATLAS experiment collects proton-proton collision events delivered by the LHC accelerator at CERN. The ATLAS Trigger and Data Acquisition (TDAQ) system selects, transports and eventually records event data from the detector at several gigabytes per second. The data are recorded on transient stor...

Descripción completa

Detalles Bibliográficos
Autores principales: Le Goff, Fabrice, Vandelli, Wainer
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1007/978-981-13-1313-4_70
http://cds.cern.ch/record/2270626
_version_ 1780954859293376512
author Le Goff, Fabrice
Vandelli, Wainer
author_facet Le Goff, Fabrice
Vandelli, Wainer
author_sort Le Goff, Fabrice
collection CERN
description The ATLAS experiment collects proton-proton collision events delivered by the LHC accelerator at CERN. The ATLAS Trigger and Data Acquisition (TDAQ) system selects, transports and eventually records event data from the detector at several gigabytes per second. The data are recorded on transient storage before being delivered to permanent storage. The transient storage consists of high-performance direct-attached storage servers accounting for about 500 hard drives. The transient storage operates dedicated software in the form of a distributed multi-threaded application. The workload includes both CPU-demanding and IO-oriented tasks. This paper presents the original application threading model for this particular workload, discussing the load-sharing strategy among the available CPU cores. The limitations of this strategy were reached in 2016 due to changes in the trigger configuration involving a new data distribution pattern. We then describe a novel data-driven load-sharing strategy, designed to automatically adapt to evolving operational conditions, as driven by the detector configuration or the physics research goals. The improved efficiency and adaptability of the solution were measured with dedicated studies on both test and production systems. This paper reports on the results of those tests which demonstrate the capability of operating in a large variety of conditions with minimal user intervention.
id cern-2270626
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling cern-22706262019-09-30T06:29:59Zdoi:10.1007/978-981-13-1313-4_70http://cds.cern.ch/record/2270626engLe Goff, FabriceVandelli, WainerAutomated load balancing in the ATLAS high-performance storage softwareParticle Physics - ExperimentThe ATLAS experiment collects proton-proton collision events delivered by the LHC accelerator at CERN. The ATLAS Trigger and Data Acquisition (TDAQ) system selects, transports and eventually records event data from the detector at several gigabytes per second. The data are recorded on transient storage before being delivered to permanent storage. The transient storage consists of high-performance direct-attached storage servers accounting for about 500 hard drives. The transient storage operates dedicated software in the form of a distributed multi-threaded application. The workload includes both CPU-demanding and IO-oriented tasks. This paper presents the original application threading model for this particular workload, discussing the load-sharing strategy among the available CPU cores. The limitations of this strategy were reached in 2016 due to changes in the trigger configuration involving a new data distribution pattern. We then describe a novel data-driven load-sharing strategy, designed to automatically adapt to evolving operational conditions, as driven by the detector configuration or the physics research goals. The improved efficiency and adaptability of the solution were measured with dedicated studies on both test and production systems. This paper reports on the results of those tests which demonstrate the capability of operating in a large variety of conditions with minimal user intervention.ATLAS [1] is one of the general purpose detectors observing proton-proton collisions provided by the LHC [2] at CERN. The ATLAS Trigger and Data Acquisition (TDAQ) system [3] is responsible for conveying the event data from the detector up to a permanent mass-storage system provided by CERN. This work focuses on the Data Logger system which lies at the end of the data flow path in the TDAQ system. The Data Logger is a transient storage system recording the selected event data on hard drives before transferring them to permanent storage where they are available for offline analysis.ATL-DAQ-PROC-2017-015oai:cds.cern.ch:22706262017-06-20
spellingShingle Particle Physics - Experiment
Le Goff, Fabrice
Vandelli, Wainer
Automated load balancing in the ATLAS high-performance storage software
title Automated load balancing in the ATLAS high-performance storage software
title_full Automated load balancing in the ATLAS high-performance storage software
title_fullStr Automated load balancing in the ATLAS high-performance storage software
title_full_unstemmed Automated load balancing in the ATLAS high-performance storage software
title_short Automated load balancing in the ATLAS high-performance storage software
title_sort automated load balancing in the atlas high-performance storage software
topic Particle Physics - Experiment
url https://dx.doi.org/10.1007/978-981-13-1313-4_70
http://cds.cern.ch/record/2270626
work_keys_str_mv AT legofffabrice automatedloadbalancingintheatlashighperformancestoragesoftware
AT vandelliwainer automatedloadbalancingintheatlashighperformancestoragesoftware