Cargando…

Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code

In the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even F...

Descripción completa

Detalles Bibliográficos
Autores principales: Andriotis, Nikolaos, Bocci, Andrea, Cano, Eric, Cappelli, Laura, Di Pilato, Antonio, Ferragina, Luca, Hugo, Gabrielle, Kortelainen, Matti Johannes, Kwok, M, Olivera Loyola, Juan Jose, Pantaleo, Felice, Perego, Aurora, Redjeb, Wahid, Dewing, M, Esseiva, J
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2872398
_version_ 1780978607218229248
author Andriotis, Nikolaos
Bocci, Andrea
Cano, Eric
Cappelli, Laura
Di Pilato, Antonio
Ferragina, Luca
Hugo, Gabrielle
Kortelainen, Matti Johannes
Kwok, M
Olivera Loyola, Juan Jose
Pantaleo, Felice
Perego, Aurora
Redjeb, Wahid
Dewing, M
Esseiva, J
author_facet Andriotis, Nikolaos
Bocci, Andrea
Cano, Eric
Cappelli, Laura
Di Pilato, Antonio
Ferragina, Luca
Hugo, Gabrielle
Kortelainen, Matti Johannes
Kwok, M
Olivera Loyola, Juan Jose
Pantaleo, Felice
Perego, Aurora
Redjeb, Wahid
Dewing, M
Esseiva, J
author_sort Andriotis, Nikolaos
collection CERN
description In the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even FPGAs. These technologies include C++ libraries such as Alpaka and Kokkos, compiler directives such as OpenMP, the SYCL open specification that can be implemented as a library or in a compiler, and standard C++ where the compiler is solely responsible for the offloading. Given this developing landscape, users have to choose the technology that best fits their applications and constraints. For example, in the CMS experiment the experience so far in heterogeneous reconstruction algorithms suggests that the full application contains a large number of relatively short computational kernels and memory transfer operations. In this work we use a stand-alone version of the CMS heterogeneous pixel reconstruction code as a realistic use case of HEP reconstruction software that is capable of leveraging GPUs effectively. We summarize the experience of porting this code base from CUDA to Alpaka, Kokkos, SYCL, std par, and OpenMP offloading. We compare the event processing throughput achieved by each version on NVIDIA and AMD as well as on a CPU, and compare those to what a native version of the code achieves on each platform.
id cern-2872398
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28723982023-09-26T18:59:59Zhttp://cds.cern.ch/record/2872398engAndriotis, NikolaosBocci, AndreaCano, EricCappelli, LauraDi Pilato, AntonioFerragina, LucaHugo, GabrielleKortelainen, Matti JohannesKwok, MOlivera Loyola, Juan JosePantaleo, FelicePerego, AuroraRedjeb, WahidDewing, MEsseiva, JEvaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction codeDetectors and Experimental TechniquesIn the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even FPGAs. These technologies include C++ libraries such as Alpaka and Kokkos, compiler directives such as OpenMP, the SYCL open specification that can be implemented as a library or in a compiler, and standard C++ where the compiler is solely responsible for the offloading. Given this developing landscape, users have to choose the technology that best fits their applications and constraints. For example, in the CMS experiment the experience so far in heterogeneous reconstruction algorithms suggests that the full application contains a large number of relatively short computational kernels and memory transfer operations. In this work we use a stand-alone version of the CMS heterogeneous pixel reconstruction code as a realistic use case of HEP reconstruction software that is capable of leveraging GPUs effectively. We summarize the experience of porting this code base from CUDA to Alpaka, Kokkos, SYCL, std par, and OpenMP offloading. We compare the event processing throughput achieved by each version on NVIDIA and AMD as well as on a CPU, and compare those to what a native version of the code achieves on each platform.CMS-CR-2023-127oai:cds.cern.ch:28723982023-08-25
spellingShingle Detectors and Experimental Techniques
Andriotis, Nikolaos
Bocci, Andrea
Cano, Eric
Cappelli, Laura
Di Pilato, Antonio
Ferragina, Luca
Hugo, Gabrielle
Kortelainen, Matti Johannes
Kwok, M
Olivera Loyola, Juan Jose
Pantaleo, Felice
Perego, Aurora
Redjeb, Wahid
Dewing, M
Esseiva, J
Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
title Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
title_full Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
title_fullStr Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
title_full_unstemmed Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
title_short Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
title_sort evaluating performance portability with the cms heterogeneous pixel reconstruction code
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/2872398
work_keys_str_mv AT andriotisnikolaos evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT bocciandrea evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT canoeric evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT cappellilaura evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT dipilatoantonio evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT ferraginaluca evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT hugogabrielle evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT kortelainenmattijohannes evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT kwokm evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT oliveraloyolajuanjose evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT pantaleofelice evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT peregoaurora evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT redjebwahid evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT dewingm evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode
AT esseivaj evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode