Cargando…

Analysis and predictive modeling of the performance of the ATLAS TDAQ network

After almost twenty years of research, development and installation, the Large Hadron Collider (LHC) accelerator at CERN produced its first collisions in 2008, planning to run until the end of 2012. ATLAS (A Torroidal LHC ApparatuS) is the biggest exper- iment built and operated on the LHC ring. Bei...

Descripción completa

Detalles Bibliográficos
Autor principal: Leahu, Lucian
Lenguaje:eng
Publicado: 2013
Materias:
Acceso en línea:http://cds.cern.ch/record/1504817
_version_ 1780927225912098816
author Leahu, Lucian
author_facet Leahu, Lucian
author_sort Leahu, Lucian
collection CERN
description After almost twenty years of research, development and installation, the Large Hadron Collider (LHC) accelerator at CERN produced its first collisions in 2008, planning to run until the end of 2012. ATLAS (A Torroidal LHC ApparatuS) is the biggest exper- iment built and operated on the LHC ring. Being a general purpose detector, it studies a wide range of physics aspects, out of which the search for the “God particle” - Higgs boson - is its most significant mission. In 2012 ATLAS already recorded collisions data, called events, which were, with a big probability, candidates for proving the ex- istence of this particle. Capturing this type of “interesting” events is the task of the ATLAS detector, however filtering them from the huge amount of data being generated is the purpose of the Trigger and Data Acquisition system (TDAQ). ATLAS TDAQ is implemented as a three layer filter, reducing in real-time the rates of the events (1.6 Mbytes big) down to a level which can be written to mass storage: from 40 MHz (64 Tbytes/s) to 200 Hz (320 Mbytes/s). This real-time selection is performed using dedicated hardware in the first level and large farms of computers in the next two levels, interconnected by a dedicated high speed Ethernet network. The efficiency of the TDAQ system is given by the continuity of the events flow and by its capability of sustaining the design rates. Level-2 is a key filtering system because it applies physics algorithms to a 100 KHz rate of events and clears 97% of the read-out buffer space, critical to the flow continuity. The high rate of event processing in Level-2, involving data analysis and data transport over the network, places strict requirements on the total processing time. A contributing factor to this is the network delay and loss, for which requirements weren’t strictly established from the design stage. We set upper limit requirements for the network delay and loss as being the total Level- 2 processing time. This is obtained by employing a mathematical queuing model on the Level-2 system. One of the limitations of this model is that it cannot differentiate the delay and loss caused by the network only. In order to overcome this, we introduce an approach applicable to any communicating systems, hence Level-2 network as well, in- corporating both loss and delay into a central concept named quality attenuation (∆Q). For data networks we propose a component-wise view of ∆Q allowing us to perform topological compositions which, for the Ethernet case, are easily applicable. We show how this method of finding contributors to the overall ∆Q is a lightweight, cheap and non-intrusive technique, applicable in system’s operational phase. We obtain on one hand performance indicators for entire network paths and for individ- ual network devices, called Structural Delay and on the other hand a prediction on how the Level-2 network’s ∆Q scales with the load placed on the system. For this we quan- tified the degree of correlation of the traffic pattern placed on the network by the TDAQ software. The ∆Q dependency on the load will then serve as a requirement trade-off space between the network and the software generating a type of traffic pattern.
id cern-1504817
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2013
record_format invenio
spelling cern-15048172019-09-30T06:29:59Zhttp://cds.cern.ch/record/1504817engLeahu, LucianAnalysis and predictive modeling of the performance of the ATLAS TDAQ networkComputing and ComputersAfter almost twenty years of research, development and installation, the Large Hadron Collider (LHC) accelerator at CERN produced its first collisions in 2008, planning to run until the end of 2012. ATLAS (A Torroidal LHC ApparatuS) is the biggest exper- iment built and operated on the LHC ring. Being a general purpose detector, it studies a wide range of physics aspects, out of which the search for the “God particle” - Higgs boson - is its most significant mission. In 2012 ATLAS already recorded collisions data, called events, which were, with a big probability, candidates for proving the ex- istence of this particle. Capturing this type of “interesting” events is the task of the ATLAS detector, however filtering them from the huge amount of data being generated is the purpose of the Trigger and Data Acquisition system (TDAQ). ATLAS TDAQ is implemented as a three layer filter, reducing in real-time the rates of the events (1.6 Mbytes big) down to a level which can be written to mass storage: from 40 MHz (64 Tbytes/s) to 200 Hz (320 Mbytes/s). This real-time selection is performed using dedicated hardware in the first level and large farms of computers in the next two levels, interconnected by a dedicated high speed Ethernet network. The efficiency of the TDAQ system is given by the continuity of the events flow and by its capability of sustaining the design rates. Level-2 is a key filtering system because it applies physics algorithms to a 100 KHz rate of events and clears 97% of the read-out buffer space, critical to the flow continuity. The high rate of event processing in Level-2, involving data analysis and data transport over the network, places strict requirements on the total processing time. A contributing factor to this is the network delay and loss, for which requirements weren’t strictly established from the design stage. We set upper limit requirements for the network delay and loss as being the total Level- 2 processing time. This is obtained by employing a mathematical queuing model on the Level-2 system. One of the limitations of this model is that it cannot differentiate the delay and loss caused by the network only. In order to overcome this, we introduce an approach applicable to any communicating systems, hence Level-2 network as well, in- corporating both loss and delay into a central concept named quality attenuation (∆Q). For data networks we propose a component-wise view of ∆Q allowing us to perform topological compositions which, for the Ethernet case, are easily applicable. We show how this method of finding contributors to the overall ∆Q is a lightweight, cheap and non-intrusive technique, applicable in system’s operational phase. We obtain on one hand performance indicators for entire network paths and for individ- ual network devices, called Structural Delay and on the other hand a prediction on how the Level-2 network’s ∆Q scales with the load placed on the system. For this we quan- tified the degree of correlation of the traffic pattern placed on the network by the TDAQ software. The ∆Q dependency on the load will then serve as a requirement trade-off space between the network and the software generating a type of traffic pattern.CERN-THESIS-2013-004oai:cds.cern.ch:15048172013-01-15T03:09:58Z
spellingShingle Computing and Computers
Leahu, Lucian
Analysis and predictive modeling of the performance of the ATLAS TDAQ network
title Analysis and predictive modeling of the performance of the ATLAS TDAQ network
title_full Analysis and predictive modeling of the performance of the ATLAS TDAQ network
title_fullStr Analysis and predictive modeling of the performance of the ATLAS TDAQ network
title_full_unstemmed Analysis and predictive modeling of the performance of the ATLAS TDAQ network
title_short Analysis and predictive modeling of the performance of the ATLAS TDAQ network
title_sort analysis and predictive modeling of the performance of the atlas tdaq network
topic Computing and Computers
url http://cds.cern.ch/record/1504817
work_keys_str_mv AT leahulucian analysisandpredictivemodelingoftheperformanceoftheatlastdaqnetwork