Cargando…
Benchmarking Data Acquisition event building network performance for the ATLAS HL-LHC upgrade
The ATLAS experiment Data Acquisition (DAQ) system will face an extensive upgrade to fully exploit the High-Luminosity LHC (HL-LHC) upgrade, allowing it to record data at unprecedented rates. The detector will be read out at 1 MHz generating over 5 TB/s of data which is sent to approximately 600 ser...
Autores principales: | , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2873584 |
Sumario: | The ATLAS experiment Data Acquisition (DAQ) system will face an extensive upgrade to fully exploit the High-Luminosity LHC (HL-LHC) upgrade, allowing it to record data at unprecedented rates. The detector will be read out at 1 MHz generating over 5 TB/s of data which is sent to approximately 600 servers. These data are then transported to the processing farm, comprising approximately 3000 servers, for a further rate reduction. This design poses significant challenges for the Ethernet-based network as it will be required to transport 20 times more data than the current system. The increased data rate, data sizes, and the number of servers will exacerbate the already diagnosed TCP incast effect, which makes it impossible to fully exploit the capabilities of the network and limits the performance of the processing farm. In this paper, we present exhaustive systematic experiments to define buffer requirements in network equipment to minimise the impacts of TCP incast on the processing applications. Three switch models were stress-tested using DAQ traffic patterns as synthetic network load in a test environment at approximately 10% scale of the expected HL-LHC DAQ system size. As the HL-LHC system's desired hardware is not currently available and the lab size is considerably smaller, tests aim to project buffer requirements with different parameters. Thus, new analytic and simulation tools are introduced to support and scale lab projections. Different candidate solutions are analysed, comparing software-based and network hardware cost-to-performance ratios to determine the most effective option that can mitigate the impact of TCP incast. The results of these evaluations will contribute to the decision-making process of acquiring network hardware for the HL-LHC data acquisition. |
---|