Cargando…

Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs

Description Experiments at the CERN High-Luminosity Large Hadron Collider (HL-LHC) will produce hundreds of Petabytes of data per year. Efficient processing of this dataset represents a significant human resource and technical challenge. Today, ATLAS data processing applications run in multi-threade...

Descripción completa

Detalles Bibliográficos
Autores principales: Stanislaus, Beojan, Calafiura, Paolo, Esseiva, Julien, Ju, Xiangyang, Leggett, Charles, Tsulaia, Vakhtang
Lenguaje:eng
Publicado: 2022
Materias:
Acceso en línea:http://cds.cern.ch/record/2838149
_version_ 1780975909986107392
author Stanislaus, Beojan
Calafiura, Paolo
Esseiva, Julien
Ju, Xiangyang
Leggett, Charles
Tsulaia, Vakhtang
author_facet Stanislaus, Beojan
Calafiura, Paolo
Esseiva, Julien
Ju, Xiangyang
Leggett, Charles
Tsulaia, Vakhtang
author_sort Stanislaus, Beojan
collection CERN
description Description Experiments at the CERN High-Luminosity Large Hadron Collider (HL-LHC) will produce hundreds of Petabytes of data per year. Efficient processing of this dataset represents a significant human resource and technical challenge. Today, ATLAS data processing applications run in multi-threaded mode, using Intel TBB for thread management, which allows efficient utilization of all available CPU cores on the computing resources. However, modern HPC systems and high-end computing clusters are increasingly based on heterogeneous architectures, usually a combination of CPU and accelerators (e.g., GPU, FPGA). To run ATLAS software on these machines efficiently, we started developing a distributed, fine-grained, vertically integrated task scheduling software system. A first simplified implementation of such a system called Raythena was developed in late 2019. It is based on Ray - a high-performance distributed execution platform developed by Riselab at UC Berkeley. Raythena leverages the ATLAS event-service architecture for efficient utilization of CPU resources on HPC systems by dynamically assigning fine-grained workloads (individual events or event ranges) to ATLAS data-processing applications running simultaneously on multiple HPC compute nodes. The main purpose of the Raythena project was to gain the experience of developing real-life applications with the Ray platform. However, in order to achieve our main objective, we need to design a new system capable of utilizing heterogeneous computing resources in a distributed environment. To accomplish this, we have started to evaluate HPX as an alternative to TBB/Ray. HPX is a C++ library for concurrency and parallelism developed by the Stellar group, which exposes a uniform, standards-oriented API for programming parallel, distributed, and heterogeneous applications. This presentation will describe the preliminary results of the evaluation of HPX for implementation of the task scheduler for ATLAS data-processing applications aimed to enable cross-node scheduling in heterogeneous systems that offer a mixture of CPU and GPU architectures. We present the prototype applications implemented using HPX and the preliminary results of performance studies of these applications. Significance This presentation describes design ideas and first simple prototype implementations of the distributed and heterogeneous task scheduling system for the ATLAS experiment. Given the increased data volumes expected to be recorded in the era of HL LHC, it becomes critical for the experiments to efficiently utilize all available computing resources, including the new generation of supercomputers, most of which will be based on heterogeneous architectures.
id cern-2838149
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2022
record_format invenio
spelling cern-28381492022-10-22T21:46:13Zhttp://cds.cern.ch/record/2838149engStanislaus, BeojanCalafiura, PaoloEsseiva, JulienJu, XiangyangLeggett, CharlesTsulaia, VakhtangEvaluating HPX as a Next-Gen Scheduler for ATLAS on HPCsParticle Physics - ExperimentDescription Experiments at the CERN High-Luminosity Large Hadron Collider (HL-LHC) will produce hundreds of Petabytes of data per year. Efficient processing of this dataset represents a significant human resource and technical challenge. Today, ATLAS data processing applications run in multi-threaded mode, using Intel TBB for thread management, which allows efficient utilization of all available CPU cores on the computing resources. However, modern HPC systems and high-end computing clusters are increasingly based on heterogeneous architectures, usually a combination of CPU and accelerators (e.g., GPU, FPGA). To run ATLAS software on these machines efficiently, we started developing a distributed, fine-grained, vertically integrated task scheduling software system. A first simplified implementation of such a system called Raythena was developed in late 2019. It is based on Ray - a high-performance distributed execution platform developed by Riselab at UC Berkeley. Raythena leverages the ATLAS event-service architecture for efficient utilization of CPU resources on HPC systems by dynamically assigning fine-grained workloads (individual events or event ranges) to ATLAS data-processing applications running simultaneously on multiple HPC compute nodes. The main purpose of the Raythena project was to gain the experience of developing real-life applications with the Ray platform. However, in order to achieve our main objective, we need to design a new system capable of utilizing heterogeneous computing resources in a distributed environment. To accomplish this, we have started to evaluate HPX as an alternative to TBB/Ray. HPX is a C++ library for concurrency and parallelism developed by the Stellar group, which exposes a uniform, standards-oriented API for programming parallel, distributed, and heterogeneous applications. This presentation will describe the preliminary results of the evaluation of HPX for implementation of the task scheduler for ATLAS data-processing applications aimed to enable cross-node scheduling in heterogeneous systems that offer a mixture of CPU and GPU architectures. We present the prototype applications implemented using HPX and the preliminary results of performance studies of these applications. Significance This presentation describes design ideas and first simple prototype implementations of the distributed and heterogeneous task scheduling system for the ATLAS experiment. Given the increased data volumes expected to be recorded in the era of HL LHC, it becomes critical for the experiments to efficiently utilize all available computing resources, including the new generation of supercomputers, most of which will be based on heterogeneous architectures.ATL-SOFT-SLIDE-2022-551oai:cds.cern.ch:28381492022-10-22
spellingShingle Particle Physics - Experiment
Stanislaus, Beojan
Calafiura, Paolo
Esseiva, Julien
Ju, Xiangyang
Leggett, Charles
Tsulaia, Vakhtang
Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs
title Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs
title_full Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs
title_fullStr Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs
title_full_unstemmed Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs
title_short Evaluating HPX as a Next-Gen Scheduler for ATLAS on HPCs
title_sort evaluating hpx as a next-gen scheduler for atlas on hpcs
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2838149
work_keys_str_mv AT stanislausbeojan evaluatinghpxasanextgenschedulerforatlasonhpcs
AT calafiurapaolo evaluatinghpxasanextgenschedulerforatlasonhpcs
AT esseivajulien evaluatinghpxasanextgenschedulerforatlasonhpcs
AT juxiangyang evaluatinghpxasanextgenschedulerforatlasonhpcs
AT leggettcharles evaluatinghpxasanextgenschedulerforatlasonhpcs
AT tsulaiavakhtang evaluatinghpxasanextgenschedulerforatlasonhpcs