NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method

Following the trend towards Exascale, today’s supercomputers consist of increasingly complex and heterogeneous compute nodes. To exploit the performance of these systems, research software in HPC needs to keep up with the rapid development of hardware architectures. Since manual tuning of software t...

Descripción completa

Detalles Bibliográficos
Autores principales: Morgenstern, Laura, Haensel, David, Beckmann, Andreas, Kabadshow, Ivo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304697/
http://dx.doi.org/10.1007/978-3-030-50436-6_31
_version_ 1783548307263455232
author Morgenstern, Laura
Haensel, David
Beckmann, Andreas
Kabadshow, Ivo
author_facet Morgenstern, Laura
Haensel, David
Beckmann, Andreas
Kabadshow, Ivo
author_sort Morgenstern, Laura
collection PubMed
description Following the trend towards Exascale, today’s supercomputers consist of increasingly complex and heterogeneous compute nodes. To exploit the performance of these systems, research software in HPC needs to keep up with the rapid development of hardware architectures. Since manual tuning of software to each and every architecture is neither sustainable nor viable, we aim to tackle this challenge through appropriate software design. In this article, we aim to improve the performance and sustainability of FMSolvr, a parallel Fast Multipole Method for Molecular Dynamics, by adapting it to Non-Uniform Memory Access architectures in a portable and maintainable way. The parallelization of FMSolvr is based on Eventify, an event-based tasking framework we co-developed with FMSolvr. We describe a layered software architecture that enables the separation of the Fast Multipole Method from its parallelization. The focus of this article is on the development and analysis of a reusable NUMA module that improves performance while keeping both layers separated to preserve maintainability and extensibility. By means of the NUMA module we introduce diverse NUMA-aware data distribution, thread pinning and work stealing policies for FMSolvr. During the performance analysis the modular design of the NUMA module was advantageous since it facilitates combination, interchange and redesign of the developed policies. The performance analysis reveals that the runtime of FMSolvr is reduced by [Formula: see text] from 1.48 ms to 1.16 ms through these policies.
format Online
Article
Text
id pubmed-7304697
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73046972020-06-22 NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method Morgenstern, Laura Haensel, David Beckmann, Andreas Kabadshow, Ivo Computational Science – ICCS 2020 Article Following the trend towards Exascale, today’s supercomputers consist of increasingly complex and heterogeneous compute nodes. To exploit the performance of these systems, research software in HPC needs to keep up with the rapid development of hardware architectures. Since manual tuning of software to each and every architecture is neither sustainable nor viable, we aim to tackle this challenge through appropriate software design. In this article, we aim to improve the performance and sustainability of FMSolvr, a parallel Fast Multipole Method for Molecular Dynamics, by adapting it to Non-Uniform Memory Access architectures in a portable and maintainable way. The parallelization of FMSolvr is based on Eventify, an event-based tasking framework we co-developed with FMSolvr. We describe a layered software architecture that enables the separation of the Fast Multipole Method from its parallelization. The focus of this article is on the development and analysis of a reusable NUMA module that improves performance while keeping both layers separated to preserve maintainability and extensibility. By means of the NUMA module we introduce diverse NUMA-aware data distribution, thread pinning and work stealing policies for FMSolvr. During the performance analysis the modular design of the NUMA module was advantageous since it facilitates combination, interchange and redesign of the developed policies. The performance analysis reveals that the runtime of FMSolvr is reduced by [Formula: see text] from 1.48 ms to 1.16 ms through these policies. 2020-05-25 /pmc/articles/PMC7304697/ http://dx.doi.org/10.1007/978-3-030-50436-6_31 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Morgenstern, Laura
Haensel, David
Beckmann, Andreas
Kabadshow, Ivo
NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
title NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
title_full NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
title_fullStr NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
title_full_unstemmed NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
title_short NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
title_sort numa-awareness as a plug-in for an eventify-based fast multipole method
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304697/
http://dx.doi.org/10.1007/978-3-030-50436-6_31
work_keys_str_mv AT morgensternlaura numaawarenessasapluginforaneventifybasedfastmultipolemethod
AT haenseldavid numaawarenessasapluginforaneventifybasedfastmultipolemethod
AT beckmannandreas numaawarenessasapluginforaneventifybasedfastmultipolemethod
AT kabadshowivo numaawarenessasapluginforaneventifybasedfastmultipolemethod