Cargando…
Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
In the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even F...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2872398 |
_version_ | 1780978607218229248 |
---|---|
author | Andriotis, Nikolaos Bocci, Andrea Cano, Eric Cappelli, Laura Di Pilato, Antonio Ferragina, Luca Hugo, Gabrielle Kortelainen, Matti Johannes Kwok, M Olivera Loyola, Juan Jose Pantaleo, Felice Perego, Aurora Redjeb, Wahid Dewing, M Esseiva, J |
author_facet | Andriotis, Nikolaos Bocci, Andrea Cano, Eric Cappelli, Laura Di Pilato, Antonio Ferragina, Luca Hugo, Gabrielle Kortelainen, Matti Johannes Kwok, M Olivera Loyola, Juan Jose Pantaleo, Felice Perego, Aurora Redjeb, Wahid Dewing, M Esseiva, J |
author_sort | Andriotis, Nikolaos |
collection | CERN |
description | In the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even FPGAs. These technologies include C++ libraries such as Alpaka and Kokkos, compiler directives such as OpenMP, the SYCL open specification that can be implemented as a library or in a compiler, and standard C++ where the compiler is solely responsible for the offloading. Given this developing landscape, users have to choose the technology that best fits their applications and constraints. For example, in the CMS experiment the experience so far in heterogeneous reconstruction algorithms suggests that the full application contains a large number of relatively short computational kernels and memory transfer operations. In this work we use a stand-alone version of the CMS heterogeneous pixel reconstruction code as a realistic use case of HEP reconstruction software that is capable of leveraging GPUs effectively. We summarize the experience of porting this code base from CUDA to Alpaka, Kokkos, SYCL, std par, and OpenMP offloading. We compare the event processing throughput achieved by each version on NVIDIA and AMD as well as on a CPU, and compare those to what a native version of the code achieves on each platform. |
id | cern-2872398 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2023 |
record_format | invenio |
spelling | cern-28723982023-09-26T18:59:59Zhttp://cds.cern.ch/record/2872398engAndriotis, NikolaosBocci, AndreaCano, EricCappelli, LauraDi Pilato, AntonioFerragina, LucaHugo, GabrielleKortelainen, Matti JohannesKwok, MOlivera Loyola, Juan JosePantaleo, FelicePerego, AuroraRedjeb, WahidDewing, MEsseiva, JEvaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction codeDetectors and Experimental TechniquesIn the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even FPGAs. These technologies include C++ libraries such as Alpaka and Kokkos, compiler directives such as OpenMP, the SYCL open specification that can be implemented as a library or in a compiler, and standard C++ where the compiler is solely responsible for the offloading. Given this developing landscape, users have to choose the technology that best fits their applications and constraints. For example, in the CMS experiment the experience so far in heterogeneous reconstruction algorithms suggests that the full application contains a large number of relatively short computational kernels and memory transfer operations. In this work we use a stand-alone version of the CMS heterogeneous pixel reconstruction code as a realistic use case of HEP reconstruction software that is capable of leveraging GPUs effectively. We summarize the experience of porting this code base from CUDA to Alpaka, Kokkos, SYCL, std par, and OpenMP offloading. We compare the event processing throughput achieved by each version on NVIDIA and AMD as well as on a CPU, and compare those to what a native version of the code achieves on each platform.CMS-CR-2023-127oai:cds.cern.ch:28723982023-08-25 |
spellingShingle | Detectors and Experimental Techniques Andriotis, Nikolaos Bocci, Andrea Cano, Eric Cappelli, Laura Di Pilato, Antonio Ferragina, Luca Hugo, Gabrielle Kortelainen, Matti Johannes Kwok, M Olivera Loyola, Juan Jose Pantaleo, Felice Perego, Aurora Redjeb, Wahid Dewing, M Esseiva, J Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code |
title | Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code |
title_full | Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code |
title_fullStr | Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code |
title_full_unstemmed | Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code |
title_short | Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code |
title_sort | evaluating performance portability with the cms heterogeneous pixel reconstruction code |
topic | Detectors and Experimental Techniques |
url | http://cds.cern.ch/record/2872398 |
work_keys_str_mv | AT andriotisnikolaos evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT bocciandrea evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT canoeric evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT cappellilaura evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT dipilatoantonio evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT ferraginaluca evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT hugogabrielle evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT kortelainenmattijohannes evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT kwokm evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT oliveraloyolajuanjose evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT pantaleofelice evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT peregoaurora evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT redjebwahid evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT dewingm evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode AT esseivaj evaluatingperformanceportabilitywiththecmsheterogeneouspixelreconstructioncode |