Cargando…

Shared I/O components for the ATLAS multi-processing framework

ATLAS uses its multi-processing framework AthenaMP for an increasing number of workflows, including simulation, reconstruction and event data filtering (derivation). After serial initialization, AthenaMP forks worker processes that then process events in parallel, with each worker reading data indiv...

Descripción completa

Detalles Bibliográficos
Autores principales: van Gemmeren, Peter, Malon, David, Nowak, Marcin, Tsulaia, Vakhtang
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:http://cds.cern.ch/record/2278398
_version_ 1780955372358467584
author van Gemmeren, Peter
Malon, David
Nowak, Marcin
Tsulaia, Vakhtang
author_facet van Gemmeren, Peter
Malon, David
Nowak, Marcin
Tsulaia, Vakhtang
author_sort van Gemmeren, Peter
collection CERN
description ATLAS uses its multi-processing framework AthenaMP for an increasing number of workflows, including simulation, reconstruction and event data filtering (derivation). After serial initialization, AthenaMP forks worker processes that then process events in parallel, with each worker reading data individually and producing its own output. This mode, however, has inefficiencies: 1) The worker no longer reads events sequentially, which negatively affects data caching strategies at the storage backend. 2) For its non-RAW data ATLAS uses ROOT and compresses across 10-100 events. Workers will only need a subsample of these events, but have to read and decompress the complete buffers. 3) Output files from the individual workers need to be merged in a separate, serial process. 4) Propagating metadata describing the complete event sample through several workers is nontrivial. To address these shortcomings, ATLAS has developed shared reader and writer components presented in this paper. With the shared reader, a single process reads the data and provides objects to the workers on demand via shared memory. The shared writer uses the same mechanism to collect output objects from the workers and write them to disk. Disk I/O and compression / decompression of data are therefore localized only in these components, event access (by the shared reader) remains sequential and a single output file is produced without merging. Still for object data, which can only be passed between processes as serialized buffers, the efficiency gains depend upon the storage backend functionality.
id cern-2278398
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling cern-22783982019-09-30T06:29:59Zhttp://cds.cern.ch/record/2278398engvan Gemmeren, PeterMalon, DavidNowak, MarcinTsulaia, VakhtangShared I/O components for the ATLAS multi-processing frameworkParticle Physics - ExperimentATLAS uses its multi-processing framework AthenaMP for an increasing number of workflows, including simulation, reconstruction and event data filtering (derivation). After serial initialization, AthenaMP forks worker processes that then process events in parallel, with each worker reading data individually and producing its own output. This mode, however, has inefficiencies: 1) The worker no longer reads events sequentially, which negatively affects data caching strategies at the storage backend. 2) For its non-RAW data ATLAS uses ROOT and compresses across 10-100 events. Workers will only need a subsample of these events, but have to read and decompress the complete buffers. 3) Output files from the individual workers need to be merged in a separate, serial process. 4) Propagating metadata describing the complete event sample through several workers is nontrivial. To address these shortcomings, ATLAS has developed shared reader and writer components presented in this paper. With the shared reader, a single process reads the data and provides objects to the workers on demand via shared memory. The shared writer uses the same mechanism to collect output objects from the workers and write them to disk. Disk I/O and compression / decompression of data are therefore localized only in these components, event access (by the shared reader) remains sequential and a single output file is produced without merging. Still for object data, which can only be passed between processes as serialized buffers, the efficiency gains depend upon the storage backend functionality.ATL-SOFT-SLIDE-2017-654oai:cds.cern.ch:22783982017-08-12
spellingShingle Particle Physics - Experiment
van Gemmeren, Peter
Malon, David
Nowak, Marcin
Tsulaia, Vakhtang
Shared I/O components for the ATLAS multi-processing framework
title Shared I/O components for the ATLAS multi-processing framework
title_full Shared I/O components for the ATLAS multi-processing framework
title_fullStr Shared I/O components for the ATLAS multi-processing framework
title_full_unstemmed Shared I/O components for the ATLAS multi-processing framework
title_short Shared I/O components for the ATLAS multi-processing framework
title_sort shared i/o components for the atlas multi-processing framework
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2278398
work_keys_str_mv AT vangemmerenpeter sharediocomponentsfortheatlasmultiprocessingframework
AT malondavid sharediocomponentsfortheatlasmultiprocessingframework
AT nowakmarcin sharediocomponentsfortheatlasmultiprocessingframework
AT tsulaiavakhtang sharediocomponentsfortheatlasmultiprocessingframework