NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method
Following the trend towards Exascale, today’s supercomputers consist of increasingly complex and heterogeneous compute nodes. To exploit the performance of these systems, research software in HPC needs to keep up with the rapid development of hardware architectures. Since manual tuning of software t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304697/ http://dx.doi.org/10.1007/978-3-030-50436-6_31 |
_version_ | 1783548307263455232 |
---|---|
author | Morgenstern, Laura Haensel, David Beckmann, Andreas Kabadshow, Ivo |
author_facet | Morgenstern, Laura Haensel, David Beckmann, Andreas Kabadshow, Ivo |
author_sort | Morgenstern, Laura |
collection | PubMed |
description | Following the trend towards Exascale, today’s supercomputers consist of increasingly complex and heterogeneous compute nodes. To exploit the performance of these systems, research software in HPC needs to keep up with the rapid development of hardware architectures. Since manual tuning of software to each and every architecture is neither sustainable nor viable, we aim to tackle this challenge through appropriate software design. In this article, we aim to improve the performance and sustainability of FMSolvr, a parallel Fast Multipole Method for Molecular Dynamics, by adapting it to Non-Uniform Memory Access architectures in a portable and maintainable way. The parallelization of FMSolvr is based on Eventify, an event-based tasking framework we co-developed with FMSolvr. We describe a layered software architecture that enables the separation of the Fast Multipole Method from its parallelization. The focus of this article is on the development and analysis of a reusable NUMA module that improves performance while keeping both layers separated to preserve maintainability and extensibility. By means of the NUMA module we introduce diverse NUMA-aware data distribution, thread pinning and work stealing policies for FMSolvr. During the performance analysis the modular design of the NUMA module was advantageous since it facilitates combination, interchange and redesign of the developed policies. The performance analysis reveals that the runtime of FMSolvr is reduced by [Formula: see text] from 1.48 ms to 1.16 ms through these policies. |
format | Online Article Text |
id | pubmed-7304697 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73046972020-06-22 NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method Morgenstern, Laura Haensel, David Beckmann, Andreas Kabadshow, Ivo Computational Science – ICCS 2020 Article Following the trend towards Exascale, today’s supercomputers consist of increasingly complex and heterogeneous compute nodes. To exploit the performance of these systems, research software in HPC needs to keep up with the rapid development of hardware architectures. Since manual tuning of software to each and every architecture is neither sustainable nor viable, we aim to tackle this challenge through appropriate software design. In this article, we aim to improve the performance and sustainability of FMSolvr, a parallel Fast Multipole Method for Molecular Dynamics, by adapting it to Non-Uniform Memory Access architectures in a portable and maintainable way. The parallelization of FMSolvr is based on Eventify, an event-based tasking framework we co-developed with FMSolvr. We describe a layered software architecture that enables the separation of the Fast Multipole Method from its parallelization. The focus of this article is on the development and analysis of a reusable NUMA module that improves performance while keeping both layers separated to preserve maintainability and extensibility. By means of the NUMA module we introduce diverse NUMA-aware data distribution, thread pinning and work stealing policies for FMSolvr. During the performance analysis the modular design of the NUMA module was advantageous since it facilitates combination, interchange and redesign of the developed policies. The performance analysis reveals that the runtime of FMSolvr is reduced by [Formula: see text] from 1.48 ms to 1.16 ms through these policies. 2020-05-25 /pmc/articles/PMC7304697/ http://dx.doi.org/10.1007/978-3-030-50436-6_31 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Morgenstern, Laura Haensel, David Beckmann, Andreas Kabadshow, Ivo NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method |
title | NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method |
title_full | NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method |
title_fullStr | NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method |
title_full_unstemmed | NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method |
title_short | NUMA-Awareness as a Plug-In for an Eventify-Based Fast Multipole Method |
title_sort | numa-awareness as a plug-in for an eventify-based fast multipole method |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304697/ http://dx.doi.org/10.1007/978-3-030-50436-6_31 |
work_keys_str_mv | AT morgensternlaura numaawarenessasapluginforaneventifybasedfastmultipolemethod AT haenseldavid numaawarenessasapluginforaneventifybasedfastmultipolemethod AT beckmannandreas numaawarenessasapluginforaneventifybasedfastmultipolemethod AT kabadshowivo numaawarenessasapluginforaneventifybasedfastmultipolemethod |