Cargando…
Detection of data taking anomalies for the ATLAS experiment
The physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2015
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2053244 |
_version_ | 1780948203292590080 |
---|---|
author | De Castro Vargas Fernandes, Julio Seixas, Jose Lehmann Miotto, Giovanna |
author_facet | De Castro Vargas Fernandes, Julio Seixas, Jose Lehmann Miotto, Giovanna |
author_sort | De Castro Vargas Fernandes, Julio |
collection | CERN |
description | The physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online detection of anomalies along the data taking period. Anomalies, in this context, are defined as an unexpected behaviour of the TDAQ system that result in a loss of data taking efficiency: the causes for those anomalies may come from the TDAQ itself or from external sources. While the TDAQ system operates, it publishes several useful information (trigger rates, dead times, memory usage…). Such information over time creates a set of time series that can be monitored in order to detect (and react to) problems (or anomalies). Here, we approach TDAQ operation monitoring through a data quality perspective, i.e, an anomaly is seen as a loss of quality (an outlier) and it is reported: this information can be used to react accordingly in quasi real-time, or to perform post-mortem analysis in order to identify the root cause of recurring anomalies and eliminate them. The proposed monitoring method makes use of a neural network estimator for the TDAQ standard behavior and an adaptive validation corridor (upper and lower limits for correct prediction) is constructed to evaluate the value of monitoring variables at each acquisition window. The network predicts the expected value of the time series for a given window and if the value is within the validation corridor it is accepted, otherwise it is flagged as an anomaly. The validity of this approach is demonstrated using a single time series as indicator, the L1 trigger rate: monitoring data from past physics runs have been used to show that already with a single variable the method is capable of identifying anomalies that had gone unnoticed during data taking. |
id | cern-2053244 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2015 |
record_format | invenio |
spelling | cern-20532442019-09-30T06:29:59Zhttp://cds.cern.ch/record/2053244engDe Castro Vargas Fernandes, JulioSeixas, JoseLehmann Miotto, GiovannaDetection of data taking anomalies for the ATLAS experimentParticle Physics - ExperimentThe physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online detection of anomalies along the data taking period. Anomalies, in this context, are defined as an unexpected behaviour of the TDAQ system that result in a loss of data taking efficiency: the causes for those anomalies may come from the TDAQ itself or from external sources. While the TDAQ system operates, it publishes several useful information (trigger rates, dead times, memory usage…). Such information over time creates a set of time series that can be monitored in order to detect (and react to) problems (or anomalies). Here, we approach TDAQ operation monitoring through a data quality perspective, i.e, an anomaly is seen as a loss of quality (an outlier) and it is reported: this information can be used to react accordingly in quasi real-time, or to perform post-mortem analysis in order to identify the root cause of recurring anomalies and eliminate them. The proposed monitoring method makes use of a neural network estimator for the TDAQ standard behavior and an adaptive validation corridor (upper and lower limits for correct prediction) is constructed to evaluate the value of monitoring variables at each acquisition window. The network predicts the expected value of the time series for a given window and if the value is within the validation corridor it is accepted, otherwise it is flagged as an anomaly. The validity of this approach is demonstrated using a single time series as indicator, the L1 trigger rate: monitoring data from past physics runs have been used to show that already with a single variable the method is capable of identifying anomalies that had gone unnoticed during data taking.ATL-DAQ-SLIDE-2015-700oai:cds.cern.ch:20532442015-09-17 |
spellingShingle | Particle Physics - Experiment De Castro Vargas Fernandes, Julio Seixas, Jose Lehmann Miotto, Giovanna Detection of data taking anomalies for the ATLAS experiment |
title | Detection of data taking anomalies for the ATLAS experiment |
title_full | Detection of data taking anomalies for the ATLAS experiment |
title_fullStr | Detection of data taking anomalies for the ATLAS experiment |
title_full_unstemmed | Detection of data taking anomalies for the ATLAS experiment |
title_short | Detection of data taking anomalies for the ATLAS experiment |
title_sort | detection of data taking anomalies for the atlas experiment |
topic | Particle Physics - Experiment |
url | http://cds.cern.ch/record/2053244 |
work_keys_str_mv | AT decastrovargasfernandesjulio detectionofdatatakinganomaliesfortheatlasexperiment AT seixasjose detectionofdatatakinganomaliesfortheatlasexperiment AT lehmannmiottogiovanna detectionofdatatakinganomaliesfortheatlasexperiment |