Cargando…
High-Throughput Network Communication with NetIO
HPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2229585 |
Sumario: | HPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include a well specified number of systems. Unfortunately traditional network communication APIs for HPC clusters like MPI or PGAS target exclusively the HPC community and are not suited well for DAQ applications. It is possible to build distributed DAQ applications using low-level system APIs like Infiniband Verbs (and this has been done), but it requires a non negligible effort and expert knowledge. On the other hand, message services like 0MQ have gained popularity in the HEP community. Such APIs allow to build distributed applications with a high-level approach and provide good performance. Unfortunately their usage usually limits developers to TCP/IP-based networks. While it is possible to operate a TCP/IP stack on top of Infiniband and OmniPath, this approach may not be very efficient compared to a direct usage of native APIs. NetIO is a simple, novel asynchronous message service that can operate on Ethernet, Infiniband and similar network fabrics. In our publication we present and describe the design and implementation of NetIO as well as evaluate its use in comparison to other approaches. NetIO supports different high-level programming models and typical workloads of HEP applications. The ATLAS FELIX project successfully uses NetIO as its central communication platform. The NetIO architecture consists of two layers: * The outer layer provides users with a choice of several socket types for different message-based communication patterns. At the moment NetIO features a low-latency point-to-point send/receive socket pair, a high-throughput point-to-point send/receive socket pair, and a high-throughput publish/subscribe socket pair. * The inner layer is pluggable and provides a basic send/receive socket pair to the upper layer to provide a consistent, uniform API across different network technologies. There are currently two working backends for NetIO: * The Ethernet backend is based on TCP/IP and POSIX sockets. * The Infiniband backend relies on libfabric with the Verbs provider from the OpenFabrics Interfaces Working Group. The libfabric package also supports other fabric technologies like iWarp, Cisco usNic, Cray GNI, Mellanox MXM and others. Via PSM and PSM2 it also natively supports Intel TrueScale and Intel OmniPath. Since libfabric is already used for the Infiniband backend, we do not foresee major challenges for porting NetIO to OmniPath, and a native OmniPath backend is currently under development. |
---|