Cargando…

Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework

The ATLAS detector at CERN records particle collision “events” delivered by the Large Hadron Collider. Its data-acquisition system identifies, selects, and stores interesting events in near real-time, with an aggregate throughput of several 10 GB/s. It is a distributed software system executed on a...

Descripción completa

Detalles Bibliográficos
Autores principales: Colombo, Tommaso, Fröning, Holger, Garcìa, Pedro Javier, Vandelli, Wainer
Lenguaje:eng
Publicado: 2016
Materias:
Acceso en línea:https://dx.doi.org/10.1007/s11227-016-1764-1
http://cds.cern.ch/record/2268412
_version_ 1780954721784168448
author Colombo, Tommaso
Fröning, Holger
Garcìa, Pedro Javier
Vandelli, Wainer
author_facet Colombo, Tommaso
Fröning, Holger
Garcìa, Pedro Javier
Vandelli, Wainer
author_sort Colombo, Tommaso
collection CERN
description The ATLAS detector at CERN records particle collision “events” delivered by the Large Hadron Collider. Its data-acquisition system identifies, selects, and stores interesting events in near real-time, with an aggregate throughput of several 10 GB/s. It is a distributed software system executed on a farm of roughly 2000 commodity worker nodes communicating via TCP/IP on an Ethernet network. Event data fragments are received from the many detector readout channels and are buffered, collected together, analyzed and either stored permanently or discarded. This system, and data-acquisition systems in general, are sensitive to the latency of the data transfer from the readout buffers to the worker nodes. Challenges affecting this transfer include the many-to-one communication pattern and the inherently bursty nature of the traffic. The main performance issues brought about by this workload are addressed in this paper, focusing in particular on the so-called TCP incast pathology. Since performing systematic studies of these issues is often impeded by operational constraints related to the mission-critical nature of these systems, we developed a simulation model of the ATLAS data-acquisition system. The resulting simulation tool is based on the well-established, widely-used OMNeT++ framework. This tool was successfully validated by comparing the obtained simulation results with existing measurements of the system’s behavior. Furthermore, the simulation tool enables the study of the theoretical behavior of the system in numerous what-if scenarios and with modifications that are not immediately applicable to the real system. In this paper, we take advantage of this to analyze the behavior of the system using different traffic shaping and scheduling policies, and with network hardware modifications. This analysis leads to conclusions that could be used to devise future system enhancements.
id oai-inspirehep.net-1603934
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2016
record_format invenio
spelling oai-inspirehep.net-16039342019-09-30T06:29:59Zdoi:10.1007/s11227-016-1764-1http://cds.cern.ch/record/2268412engColombo, TommasoFröning, HolgerGarcìa, Pedro JavierVandelli, WainerOptimizing the data-collection time of a large-scale data-acquisition system through a simulation frameworkComputing and ComputersThe ATLAS detector at CERN records particle collision “events” delivered by the Large Hadron Collider. Its data-acquisition system identifies, selects, and stores interesting events in near real-time, with an aggregate throughput of several 10 GB/s. It is a distributed software system executed on a farm of roughly 2000 commodity worker nodes communicating via TCP/IP on an Ethernet network. Event data fragments are received from the many detector readout channels and are buffered, collected together, analyzed and either stored permanently or discarded. This system, and data-acquisition systems in general, are sensitive to the latency of the data transfer from the readout buffers to the worker nodes. Challenges affecting this transfer include the many-to-one communication pattern and the inherently bursty nature of the traffic. The main performance issues brought about by this workload are addressed in this paper, focusing in particular on the so-called TCP incast pathology. Since performing systematic studies of these issues is often impeded by operational constraints related to the mission-critical nature of these systems, we developed a simulation model of the ATLAS data-acquisition system. The resulting simulation tool is based on the well-established, widely-used OMNeT++ framework. This tool was successfully validated by comparing the obtained simulation results with existing measurements of the system’s behavior. Furthermore, the simulation tool enables the study of the theoretical behavior of the system in numerous what-if scenarios and with modifications that are not immediately applicable to the real system. In this paper, we take advantage of this to analyze the behavior of the system using different traffic shaping and scheduling policies, and with network hardware modifications. This analysis leads to conclusions that could be used to devise future system enhancements.oai:inspirehep.net:16039342016
spellingShingle Computing and Computers
Colombo, Tommaso
Fröning, Holger
Garcìa, Pedro Javier
Vandelli, Wainer
Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
title Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
title_full Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
title_fullStr Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
title_full_unstemmed Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
title_short Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
title_sort optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
topic Computing and Computers
url https://dx.doi.org/10.1007/s11227-016-1764-1
http://cds.cern.ch/record/2268412
work_keys_str_mv AT colombotommaso optimizingthedatacollectiontimeofalargescaledataacquisitionsystemthroughasimulationframework
AT froningholger optimizingthedatacollectiontimeofalargescaledataacquisitionsystemthroughasimulationframework
AT garciapedrojavier optimizingthedatacollectiontimeofalargescaledataacquisitionsystemthroughasimulationframework
AT vandelliwainer optimizingthedatacollectiontimeofalargescaledataacquisitionsystemthroughasimulationframework