Cargando…
Analysis and predictive modeling of the performance of the ATLAS TDAQ network
After almost twenty years of research, development and installation, the Large Hadron Collider (LHC) accelerator at CERN produced its first collisions in 2008, planning to run until the end of 2012. ATLAS (A Torroidal LHC ApparatuS) is the biggest exper- iment built and operated on the LHC ring. Bei...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2013
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1504817 |
_version_ | 1780927225912098816 |
---|---|
author | Leahu, Lucian |
author_facet | Leahu, Lucian |
author_sort | Leahu, Lucian |
collection | CERN |
description | After almost twenty years of research, development and installation, the Large Hadron Collider (LHC) accelerator at CERN produced its first collisions in 2008, planning to run until the end of 2012. ATLAS (A Torroidal LHC ApparatuS) is the biggest exper- iment built and operated on the LHC ring. Being a general purpose detector, it studies a wide range of physics aspects, out of which the search for the “God particle” - Higgs boson - is its most significant mission. In 2012 ATLAS already recorded collisions data, called events, which were, with a big probability, candidates for proving the ex- istence of this particle. Capturing this type of “interesting” events is the task of the ATLAS detector, however filtering them from the huge amount of data being generated is the purpose of the Trigger and Data Acquisition system (TDAQ). ATLAS TDAQ is implemented as a three layer filter, reducing in real-time the rates of the events (1.6 Mbytes big) down to a level which can be written to mass storage: from 40 MHz (64 Tbytes/s) to 200 Hz (320 Mbytes/s). This real-time selection is performed using dedicated hardware in the first level and large farms of computers in the next two levels, interconnected by a dedicated high speed Ethernet network. The efficiency of the TDAQ system is given by the continuity of the events flow and by its capability of sustaining the design rates. Level-2 is a key filtering system because it applies physics algorithms to a 100 KHz rate of events and clears 97% of the read-out buffer space, critical to the flow continuity. The high rate of event processing in Level-2, involving data analysis and data transport over the network, places strict requirements on the total processing time. A contributing factor to this is the network delay and loss, for which requirements weren’t strictly established from the design stage. We set upper limit requirements for the network delay and loss as being the total Level- 2 processing time. This is obtained by employing a mathematical queuing model on the Level-2 system. One of the limitations of this model is that it cannot differentiate the delay and loss caused by the network only. In order to overcome this, we introduce an approach applicable to any communicating systems, hence Level-2 network as well, in- corporating both loss and delay into a central concept named quality attenuation (∆Q). For data networks we propose a component-wise view of ∆Q allowing us to perform topological compositions which, for the Ethernet case, are easily applicable. We show how this method of finding contributors to the overall ∆Q is a lightweight, cheap and non-intrusive technique, applicable in system’s operational phase. We obtain on one hand performance indicators for entire network paths and for individ- ual network devices, called Structural Delay and on the other hand a prediction on how the Level-2 network’s ∆Q scales with the load placed on the system. For this we quan- tified the degree of correlation of the traffic pattern placed on the network by the TDAQ software. The ∆Q dependency on the load will then serve as a requirement trade-off space between the network and the software generating a type of traffic pattern. |
id | cern-1504817 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2013 |
record_format | invenio |
spelling | cern-15048172019-09-30T06:29:59Zhttp://cds.cern.ch/record/1504817engLeahu, LucianAnalysis and predictive modeling of the performance of the ATLAS TDAQ networkComputing and ComputersAfter almost twenty years of research, development and installation, the Large Hadron Collider (LHC) accelerator at CERN produced its first collisions in 2008, planning to run until the end of 2012. ATLAS (A Torroidal LHC ApparatuS) is the biggest exper- iment built and operated on the LHC ring. Being a general purpose detector, it studies a wide range of physics aspects, out of which the search for the “God particle” - Higgs boson - is its most significant mission. In 2012 ATLAS already recorded collisions data, called events, which were, with a big probability, candidates for proving the ex- istence of this particle. Capturing this type of “interesting” events is the task of the ATLAS detector, however filtering them from the huge amount of data being generated is the purpose of the Trigger and Data Acquisition system (TDAQ). ATLAS TDAQ is implemented as a three layer filter, reducing in real-time the rates of the events (1.6 Mbytes big) down to a level which can be written to mass storage: from 40 MHz (64 Tbytes/s) to 200 Hz (320 Mbytes/s). This real-time selection is performed using dedicated hardware in the first level and large farms of computers in the next two levels, interconnected by a dedicated high speed Ethernet network. The efficiency of the TDAQ system is given by the continuity of the events flow and by its capability of sustaining the design rates. Level-2 is a key filtering system because it applies physics algorithms to a 100 KHz rate of events and clears 97% of the read-out buffer space, critical to the flow continuity. The high rate of event processing in Level-2, involving data analysis and data transport over the network, places strict requirements on the total processing time. A contributing factor to this is the network delay and loss, for which requirements weren’t strictly established from the design stage. We set upper limit requirements for the network delay and loss as being the total Level- 2 processing time. This is obtained by employing a mathematical queuing model on the Level-2 system. One of the limitations of this model is that it cannot differentiate the delay and loss caused by the network only. In order to overcome this, we introduce an approach applicable to any communicating systems, hence Level-2 network as well, in- corporating both loss and delay into a central concept named quality attenuation (∆Q). For data networks we propose a component-wise view of ∆Q allowing us to perform topological compositions which, for the Ethernet case, are easily applicable. We show how this method of finding contributors to the overall ∆Q is a lightweight, cheap and non-intrusive technique, applicable in system’s operational phase. We obtain on one hand performance indicators for entire network paths and for individ- ual network devices, called Structural Delay and on the other hand a prediction on how the Level-2 network’s ∆Q scales with the load placed on the system. For this we quan- tified the degree of correlation of the traffic pattern placed on the network by the TDAQ software. The ∆Q dependency on the load will then serve as a requirement trade-off space between the network and the software generating a type of traffic pattern.CERN-THESIS-2013-004oai:cds.cern.ch:15048172013-01-15T03:09:58Z |
spellingShingle | Computing and Computers Leahu, Lucian Analysis and predictive modeling of the performance of the ATLAS TDAQ network |
title | Analysis and predictive modeling of the performance of the ATLAS TDAQ network |
title_full | Analysis and predictive modeling of the performance of the ATLAS TDAQ network |
title_fullStr | Analysis and predictive modeling of the performance of the ATLAS TDAQ network |
title_full_unstemmed | Analysis and predictive modeling of the performance of the ATLAS TDAQ network |
title_short | Analysis and predictive modeling of the performance of the ATLAS TDAQ network |
title_sort | analysis and predictive modeling of the performance of the atlas tdaq network |
topic | Computing and Computers |
url | http://cds.cern.ch/record/1504817 |
work_keys_str_mv | AT leahulucian analysisandpredictivemodelingoftheperformanceoftheatlastdaqnetwork |