Cargando…
High-throughput and low-latency network communication with NetIO
HPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/898/8/082003 http://cds.cern.ch/record/2260396 |
Sumario: | HPC network technologies like Infiniband, TrueScale or OmniPath provide low-latency and high-throughput communication between hosts, which makes them attractive options for data-acquisition systems in large-scale high-energy physics experiments. Like HPC networks, DAQ networks are local and include a well specified number of systems. Unfortunately traditional network communication APIs for HPC clusters like MPI or PGAS target exclusively the HPC community and are not suited well for DAQ applications. It is possible to build distributed DAQ applications using low-level system APIs like Infiniband Verbs, but it requires a non-negligible effort and expert knowledge. At the same time, message services like ZeroMQ have gained popularity in the HEP community. They allow building distributed applications with a high-level approach and provide good performance. Unfortunately their usage usually limits developers to TCP/IP-based networks. While it is possible to operate a TCP/IP stack on top of Infiniband and OmniPath, this approach may not be very efficient compared to a direct usage of native APIs. NetIO is a simple, novel asynchronous message service that can operate on Ethernet, Infiniband and similar network fabrics. In the publication the design and implementation of NetIO is presented and described, and its use is evaluated in comparison to other approaches and show performance studies. NetIO supports different high-level programming models and typical workloads of HEP applications. The ATLAS FELIX project successfully uses NetIO as its central communication platform. The architecture of NetIO is described in this paper, including the user-level API and the internal design of the data flow. The paper includes a performance evaluation of NetIO including throughput and latency measurements. The performance is compared against the state-of-the-art ZeroMQ message service. Performance measurements are performend in a lab environment with 40G Ethernet and FDR Infiniband networks. |
---|