Cargando…

LLAMA: The Low Level Abstraction For Memory Access

The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gruber, Bernhard Manfred, Amadio, Guilherme, Blomer, Jakob, Matthes, Alexander, Widera, René, Bussmann, Michael
Lenguaje:	eng
Publicado:	2022
Materias:	Computing and Computers cs.PF
Acceso en línea:	https://dx.doi.org/10.1002/spe.3077 http://cds.cern.ch/record/2772593

_version_	1780971517744513024
author	Gruber, Bernhard Manfred Amadio, Guilherme Blomer, Jakob Matthes, Alexander Widera, René Bussmann, Michael
author_facet	Gruber, Bernhard Manfred Amadio, Guilherme Blomer, Jakob Matthes, Alexander Widera, René Bussmann, Michael
author_sort	Gruber, Bernhard Manfred
collection	CERN
description	The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program. This can be accomplished via a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged. We present the Low-Level Abstraction of Memory Access (LLAMA), a C++ library that provides such a data structure abstraction layer with example implementations for multidimensional arrays of nested, structured data. LLAMA provides fully C++ compliant methods for defining and switching custom memory layouts for user-defined data types. The library is extensible with third-party allocators. Providing two close-to-life examples, we show that the LLAMA-generated AoS (Array of Structs) and SoA (Struct of Arrays) layouts produce identical code with the same performance characteristics as manually written data structures. Integrations into the SPEC CPU\textsuperscript® lbm benchmark and the particle-in-cell simulation PIConGPU demonstrate LLAMA's abilities in real-world applications. LLAMA's layout-aware copy routines can significantly speed up transfer and reshuffling of data between layouts compared with naive element-wise copying. LLAMA provides a novel tool for the development of high-performance C++ applications in a heterogeneous environment.
id	oai-inspirehep.net-1867567
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2022
record_format	invenio
spelling	oai-inspirehep.net-18675672023-03-27T14:02:03Zdoi:10.1002/spe.3077http://cds.cern.ch/record/2772593engGruber, Bernhard ManfredAmadio, GuilhermeBlomer, JakobMatthes, AlexanderWidera, RenéBussmann, MichaelLLAMA: The Low Level Abstraction For Memory AccessComputing and Computerscs.PFThe performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program. This can be accomplished via a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged. We present the Low-Level Abstraction of Memory Access (LLAMA), a C++ library that provides such a data structure abstraction layer with example implementations for multidimensional arrays of nested, structured data. LLAMA provides fully C++ compliant methods for defining and switching custom memory layouts for user-defined data types. The library is extensible with third-party allocators. Providing two close-to-life examples, we show that the LLAMA-generated AoS (Array of Structs) and SoA (Struct of Arrays) layouts produce identical code with the same performance characteristics as manually written data structures. Integrations into the SPEC CPU\textsuperscript® lbm benchmark and the particle-in-cell simulation PIConGPU demonstrate LLAMA's abilities in real-world applications. LLAMA's layout-aware copy routines can significantly speed up transfer and reshuffling of data between layouts compared with naive element-wise copying. LLAMA provides a novel tool for the development of high-performance C++ applications in a heterogeneous environment.arXiv:2106.04284oai:inspirehep.net:18675672022
spellingShingle	Computing and Computers cs.PF Gruber, Bernhard Manfred Amadio, Guilherme Blomer, Jakob Matthes, Alexander Widera, René Bussmann, Michael LLAMA: The Low Level Abstraction For Memory Access
title	LLAMA: The Low Level Abstraction For Memory Access
title_full	LLAMA: The Low Level Abstraction For Memory Access
title_fullStr	LLAMA: The Low Level Abstraction For Memory Access
title_full_unstemmed	LLAMA: The Low Level Abstraction For Memory Access
title_short	LLAMA: The Low Level Abstraction For Memory Access
title_sort	llama: the low level abstraction for memory access
topic	Computing and Computers cs.PF
url	https://dx.doi.org/10.1002/spe.3077 http://cds.cern.ch/record/2772593
work_keys_str_mv	AT gruberbernhardmanfred llamathelowlevelabstractionformemoryaccess AT amadioguilherme llamathelowlevelabstractionformemoryaccess AT blomerjakob llamathelowlevelabstractionformemoryaccess AT matthesalexander llamathelowlevelabstractionformemoryaccess AT widerarene llamathelowlevelabstractionformemoryaccess AT bussmannmichael llamathelowlevelabstractionformemoryaccess

LLAMA: The Low Level Abstraction For Memory Access

Ejemplares similares