Cargando…

Detection of data taking anomalies for the ATLAS experiment

The physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online...

Descripción completa

Detalles Bibliográficos
Autores principales: De Castro Vargas Fernandes, Julio, Seixas, Jose, Lehmann Miotto, Giovanna
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:http://cds.cern.ch/record/2053244
_version_ 1780948203292590080
author De Castro Vargas Fernandes, Julio
Seixas, Jose
Lehmann Miotto, Giovanna
author_facet De Castro Vargas Fernandes, Julio
Seixas, Jose
Lehmann Miotto, Giovanna
author_sort De Castro Vargas Fernandes, Julio
collection CERN
description The physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online detection of anomalies along the data taking period. Anomalies, in this context, are defined as an unexpected behaviour of the TDAQ system that result in a loss of data taking efficiency: the causes for those anomalies may come from the TDAQ itself or from external sources. While the TDAQ system operates, it publishes several useful information (trigger rates, dead times, memory usage…). Such information over time creates a set of time series that can be monitored in order to detect (and react to) problems (or anomalies). Here, we approach TDAQ operation monitoring through a data quality perspective, i.e, an anomaly is seen as a loss of quality (an outlier) and it is reported: this information can be used to react accordingly in quasi real-time, or to perform post-mortem analysis in order to identify the root cause of recurring anomalies and eliminate them. The proposed monitoring method makes use of a neural network estimator for the TDAQ standard behavior and an adaptive validation corridor (upper and lower limits for correct prediction) is constructed to evaluate the value of monitoring variables at each acquisition window. The network predicts the expected value of the time series for a given window and if the value is within the validation corridor it is accepted, otherwise it is flagged as an anomaly. The validity of this approach is demonstrated using a single time series as indicator, the L1 trigger rate: monitoring data from past physics runs have been used to show that already with a single variable the method is capable of identifying anomalies that had gone unnoticed during data taking.
id cern-2053244
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling cern-20532442019-09-30T06:29:59Zhttp://cds.cern.ch/record/2053244engDe Castro Vargas Fernandes, JulioSeixas, JoseLehmann Miotto, GiovannaDetection of data taking anomalies for the ATLAS experimentParticle Physics - ExperimentThe physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online detection of anomalies along the data taking period. Anomalies, in this context, are defined as an unexpected behaviour of the TDAQ system that result in a loss of data taking efficiency: the causes for those anomalies may come from the TDAQ itself or from external sources. While the TDAQ system operates, it publishes several useful information (trigger rates, dead times, memory usage…). Such information over time creates a set of time series that can be monitored in order to detect (and react to) problems (or anomalies). Here, we approach TDAQ operation monitoring through a data quality perspective, i.e, an anomaly is seen as a loss of quality (an outlier) and it is reported: this information can be used to react accordingly in quasi real-time, or to perform post-mortem analysis in order to identify the root cause of recurring anomalies and eliminate them. The proposed monitoring method makes use of a neural network estimator for the TDAQ standard behavior and an adaptive validation corridor (upper and lower limits for correct prediction) is constructed to evaluate the value of monitoring variables at each acquisition window. The network predicts the expected value of the time series for a given window and if the value is within the validation corridor it is accepted, otherwise it is flagged as an anomaly. The validity of this approach is demonstrated using a single time series as indicator, the L1 trigger rate: monitoring data from past physics runs have been used to show that already with a single variable the method is capable of identifying anomalies that had gone unnoticed during data taking.ATL-DAQ-SLIDE-2015-700oai:cds.cern.ch:20532442015-09-17
spellingShingle Particle Physics - Experiment
De Castro Vargas Fernandes, Julio
Seixas, Jose
Lehmann Miotto, Giovanna
Detection of data taking anomalies for the ATLAS experiment
title Detection of data taking anomalies for the ATLAS experiment
title_full Detection of data taking anomalies for the ATLAS experiment
title_fullStr Detection of data taking anomalies for the ATLAS experiment
title_full_unstemmed Detection of data taking anomalies for the ATLAS experiment
title_short Detection of data taking anomalies for the ATLAS experiment
title_sort detection of data taking anomalies for the atlas experiment
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2053244
work_keys_str_mv AT decastrovargasfernandesjulio detectionofdatatakinganomaliesfortheatlasexperiment
AT seixasjose detectionofdatatakinganomaliesfortheatlasexperiment
AT lehmannmiottogiovanna detectionofdatatakinganomaliesfortheatlasexperiment