Cargando…

High-Throughput Network Communication with NetIO

HPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include...

Descripción completa

Detalles Bibliográficos
Autores principales: Schumacher, J\"orn, Plessl, Christian, Vandelli, Wainer
Lenguaje:eng
Publicado: 2016
Materias:
Acceso en línea:http://cds.cern.ch/record/2229585
_version_ 1780952488808022016
author Schumacher, J\"orn
Plessl, Christian
Vandelli, Wainer
author_facet Schumacher, J\"orn
Plessl, Christian
Vandelli, Wainer
author_sort Schumacher, J\"orn
collection CERN
description HPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include a well specified number of systems. Unfortunately traditional network communication APIs for HPC clusters like MPI or PGAS target exclusively the HPC community and are not suited well for DAQ applications. It is possible to build distributed DAQ applications using low-level system APIs like Infiniband Verbs (and this has been done), but it requires a non negligible effort and expert knowledge. On the other hand, message services like 0MQ have gained popularity in the HEP community. Such APIs allow to build distributed applications with a high-level approach and provide good performance. Unfortunately their usage usually limits developers to TCP/IP-based networks. While it is possible to operate a TCP/IP stack on top of Infiniband and OmniPath, this approach may not be very efficient compared to a direct usage of native APIs. NetIO is a simple, novel asynchronous message service that can operate on Ethernet, Infiniband and similar network fabrics. In our publication we present and describe the design and implementation of NetIO as well as evaluate its use in comparison to other approaches. NetIO supports different high-level programming models and typical workloads of HEP applications. The ATLAS FELIX project successfully uses NetIO as its central communication platform. The NetIO architecture consists of two layers: * The outer layer provides users with a choice of several socket types for different message-based communication patterns. At the moment NetIO features a low-latency point-to-point send/receive socket pair, a high-throughput point-to-point send/receive socket pair, and a high-throughput publish/subscribe socket pair. * The inner layer is pluggable and provides a basic send/receive socket pair to the upper layer to provide a consistent, uniform API across different network technologies. There are currently two working backends for NetIO: * The Ethernet backend is based on TCP/IP and POSIX sockets. * The Infiniband backend relies on libfabric with the Verbs provider from the OpenFabrics Interfaces Working Group. The libfabric package also supports other fabric technologies like iWarp, Cisco usNic, Cray GNI, Mellanox MXM and others. Via PSM and PSM2 it also natively supports Intel TrueScale and Intel OmniPath. Since libfabric is already used for the Infiniband backend, we do not foresee major challenges for porting NetIO to OmniPath, and a native OmniPath backend is currently under development.
id cern-2229585
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2016
record_format invenio
spelling cern-22295852019-09-30T06:29:59Zhttp://cds.cern.ch/record/2229585engSchumacher, J\"ornPlessl, ChristianVandelli, WainerHigh-Throughput Network Communication with NetIOParticle Physics - ExperimentHPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include a well specified number of systems. Unfortunately traditional network communication APIs for HPC clusters like MPI or PGAS target exclusively the HPC community and are not suited well for DAQ applications. It is possible to build distributed DAQ applications using low-level system APIs like Infiniband Verbs (and this has been done), but it requires a non negligible effort and expert knowledge. On the other hand, message services like 0MQ have gained popularity in the HEP community. Such APIs allow to build distributed applications with a high-level approach and provide good performance. Unfortunately their usage usually limits developers to TCP/IP-based networks. While it is possible to operate a TCP/IP stack on top of Infiniband and OmniPath, this approach may not be very efficient compared to a direct usage of native APIs. NetIO is a simple, novel asynchronous message service that can operate on Ethernet, Infiniband and similar network fabrics. In our publication we present and describe the design and implementation of NetIO as well as evaluate its use in comparison to other approaches. NetIO supports different high-level programming models and typical workloads of HEP applications. The ATLAS FELIX project successfully uses NetIO as its central communication platform. The NetIO architecture consists of two layers: * The outer layer provides users with a choice of several socket types for different message-based communication patterns. At the moment NetIO features a low-latency point-to-point send/receive socket pair, a high-throughput point-to-point send/receive socket pair, and a high-throughput publish/subscribe socket pair. * The inner layer is pluggable and provides a basic send/receive socket pair to the upper layer to provide a consistent, uniform API across different network technologies. There are currently two working backends for NetIO: * The Ethernet backend is based on TCP/IP and POSIX sockets. * The Infiniband backend relies on libfabric with the Verbs provider from the OpenFabrics Interfaces Working Group. The libfabric package also supports other fabric technologies like iWarp, Cisco usNic, Cray GNI, Mellanox MXM and others. Via PSM and PSM2 it also natively supports Intel TrueScale and Intel OmniPath. Since libfabric is already used for the Infiniband backend, we do not foresee major challenges for porting NetIO to OmniPath, and a native OmniPath backend is currently under development.ATL-DAQ-SLIDE-2016-842oai:cds.cern.ch:22295852016-11-03
spellingShingle Particle Physics - Experiment
Schumacher, J\"orn
Plessl, Christian
Vandelli, Wainer
High-Throughput Network Communication with NetIO
title High-Throughput Network Communication with NetIO
title_full High-Throughput Network Communication with NetIO
title_fullStr High-Throughput Network Communication with NetIO
title_full_unstemmed High-Throughput Network Communication with NetIO
title_short High-Throughput Network Communication with NetIO
title_sort high-throughput network communication with netio
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2229585
work_keys_str_mv AT schumacherjorn highthroughputnetworkcommunicationwithnetio
AT plesslchristian highthroughputnetworkcommunicationwithnetio
AT vandelliwainer highthroughputnetworkcommunicationwithnetio