Cargando…

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation

This paper describes the new hardware-based streaming-aggregation capability added to Mellanox’s Scalable Hierarchical Aggregation and Reduction Protocol in its HDR InfiniBand switches. For large messages, this capability is designed to achieve reduction bandwidths similar to those of point-to-point...

Descripción completa

Detalles Bibliográficos
Autores principales: Graham, Richard L., Levi, Lion, Burredy, Devendar, Bloch, Gil, Shainer, Gilad, Cho, David, Elias, George, Klein, Daniel, Ladd, Joshua, Maor, Ophir, Marelli, Ami, Petrov, Valentin, Romlet, Evyatar, Qin, Yong, Zemah, Ido
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295336/
http://dx.doi.org/10.1007/978-3-030-50743-5_3
_version_ 1783546630993084416
author Graham, Richard L.
Levi, Lion
Burredy, Devendar
Bloch, Gil
Shainer, Gilad
Cho, David
Elias, George
Klein, Daniel
Ladd, Joshua
Maor, Ophir
Marelli, Ami
Petrov, Valentin
Romlet, Evyatar
Qin, Yong
Zemah, Ido
author_facet Graham, Richard L.
Levi, Lion
Burredy, Devendar
Bloch, Gil
Shainer, Gilad
Cho, David
Elias, George
Klein, Daniel
Ladd, Joshua
Maor, Ophir
Marelli, Ami
Petrov, Valentin
Romlet, Evyatar
Qin, Yong
Zemah, Ido
author_sort Graham, Richard L.
collection PubMed
description This paper describes the new hardware-based streaming-aggregation capability added to Mellanox’s Scalable Hierarchical Aggregation and Reduction Protocol in its HDR InfiniBand switches. For large messages, this capability is designed to achieve reduction bandwidths similar to those of point-to-point messages of the same size, and complements the latency-optimized low-latency aggregation reduction capabilities, aimed at small data reductions. MPI_Allreduce() bandwidth measured on an HDR InfiniBand based system achieves about 95% of network bandwidth. For medium and large data reduction this also improves the reduction bandwidth by a factor of 2–5 relative to host-based (e.g., software-based) reduction algorithms. Using this capability also increased DL-Poly and PyTorch application performance by as much as 4% and 18%, respectively. This paper describes SHARP Streaming-Aggregation hardware architecture and a set of synthetic and application benchmarks used to study this new reduction capability, and the range of data sizes for which Streaming-Aggregation performs better than the low-latency aggregation algorithm.
format Online
Article
Text
id pubmed-7295336
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72953362020-06-16 Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation Graham, Richard L. Levi, Lion Burredy, Devendar Bloch, Gil Shainer, Gilad Cho, David Elias, George Klein, Daniel Ladd, Joshua Maor, Ophir Marelli, Ami Petrov, Valentin Romlet, Evyatar Qin, Yong Zemah, Ido High Performance Computing Article This paper describes the new hardware-based streaming-aggregation capability added to Mellanox’s Scalable Hierarchical Aggregation and Reduction Protocol in its HDR InfiniBand switches. For large messages, this capability is designed to achieve reduction bandwidths similar to those of point-to-point messages of the same size, and complements the latency-optimized low-latency aggregation reduction capabilities, aimed at small data reductions. MPI_Allreduce() bandwidth measured on an HDR InfiniBand based system achieves about 95% of network bandwidth. For medium and large data reduction this also improves the reduction bandwidth by a factor of 2–5 relative to host-based (e.g., software-based) reduction algorithms. Using this capability also increased DL-Poly and PyTorch application performance by as much as 4% and 18%, respectively. This paper describes SHARP Streaming-Aggregation hardware architecture and a set of synthetic and application benchmarks used to study this new reduction capability, and the range of data sizes for which Streaming-Aggregation performs better than the low-latency aggregation algorithm. 2020-05-22 /pmc/articles/PMC7295336/ http://dx.doi.org/10.1007/978-3-030-50743-5_3 Text en © The Author(s) 2020 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
spellingShingle Article
Graham, Richard L.
Levi, Lion
Burredy, Devendar
Bloch, Gil
Shainer, Gilad
Cho, David
Elias, George
Klein, Daniel
Ladd, Joshua
Maor, Ophir
Marelli, Ami
Petrov, Valentin
Romlet, Evyatar
Qin, Yong
Zemah, Ido
Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation
title Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation
title_full Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation
title_fullStr Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation
title_full_unstemmed Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation
title_short Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)(TM) Streaming-Aggregation Hardware Design and Evaluation
title_sort scalable hierarchical aggregation and reduction protocol (sharp)(tm) streaming-aggregation hardware design and evaluation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295336/
http://dx.doi.org/10.1007/978-3-030-50743-5_3
work_keys_str_mv AT grahamrichardl scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT levilion scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT burredydevendar scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT blochgil scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT shainergilad scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT chodavid scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT eliasgeorge scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT kleindaniel scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT laddjoshua scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT maorophir scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT marelliami scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT petrovvalentin scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT romletevyatar scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT qinyong scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation
AT zemahido scalablehierarchicalaggregationandreductionprotocolsharptmstreamingaggregationhardwaredesignandevaluation