Cargando…

Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments

Data Quality Assurance plays an important role in all high-energy physics experiments. Currently used methods rely heavily on manual labour and human expert judgements. Hence, multiple attempts are being undertaken to develop automatic solutions especially based on machine learning techniques as the...

Descripción completa

Detalles Bibliográficos
Autor principal: Deja, Kamil Rafal
Lenguaje:eng
Publicado: SISSA 2019
Materias:
Acceso en línea:https://dx.doi.org/10.22323/1.350.0236
http://cds.cern.ch/record/2707754
_version_ 1780964983719329792
author Deja, Kamil Rafal
author_facet Deja, Kamil Rafal
author_sort Deja, Kamil Rafal
collection CERN
description Data Quality Assurance plays an important role in all high-energy physics experiments. Currently used methods rely heavily on manual labour and human expert judgements. Hence, multiple attempts are being undertaken to develop automatic solutions especially based on machine learning techniques as the core part of Data Quality Monitoring systems. However, anomalies caused by detector malfunctioning or sub–optimal data processing are difficult to enumerate a priori and occur rarely, making it difficult to use supervised classification. Therefore, researchers from different experiments including ALICE and CMS work extensively on semi–supervised and unsupervised algorithms in order to distinguish potential outliers without manually assigned labels. In this contribution, we will discuss several projects whose that aim at solve this task. Machine learning based solutions bring several advantages and may provide fast and reliable data quality assurance, simultaneously reducing the manpower requirements. A good example of this approach is a model based on deep autoencoder employed in the CMS experiment which has been successfully qualified on CMS data collected during the 2016 LHC run. Tests indicate that this solution is able to detect anomalies with high accuracy and low fake rate when compared against the outcome of the manual labelling by experts. Researchers from the ALICE experiment are currently working on a similar task. They intend to perform a data quality checks in much higher granularity. The current approach is limited to run classification based on manually set cut–offs on descriptive data statistics. More sophisticated machine learning based methods may enable more accurate data selection, on high granularity level of 15-minutes data acquisition periods.
id oai-inspirehep.net-1769963
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
publisher SISSA
record_format invenio
spelling oai-inspirehep.net-17699632020-01-28T19:23:44Zdoi:10.22323/1.350.0236http://cds.cern.ch/record/2707754engDeja, Kamil RafalUsing Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experimentsNuclear Physics - ExperimentParticle Physics - ExperimentData Quality Assurance plays an important role in all high-energy physics experiments. Currently used methods rely heavily on manual labour and human expert judgements. Hence, multiple attempts are being undertaken to develop automatic solutions especially based on machine learning techniques as the core part of Data Quality Monitoring systems. However, anomalies caused by detector malfunctioning or sub–optimal data processing are difficult to enumerate a priori and occur rarely, making it difficult to use supervised classification. Therefore, researchers from different experiments including ALICE and CMS work extensively on semi–supervised and unsupervised algorithms in order to distinguish potential outliers without manually assigned labels. In this contribution, we will discuss several projects whose that aim at solve this task. Machine learning based solutions bring several advantages and may provide fast and reliable data quality assurance, simultaneously reducing the manpower requirements. A good example of this approach is a model based on deep autoencoder employed in the CMS experiment which has been successfully qualified on CMS data collected during the 2016 LHC run. Tests indicate that this solution is able to detect anomalies with high accuracy and low fake rate when compared against the outcome of the manual labelling by experts. Researchers from the ALICE experiment are currently working on a similar task. They intend to perform a data quality checks in much higher granularity. The current approach is limited to run classification based on manually set cut–offs on descriptive data statistics. More sophisticated machine learning based methods may enable more accurate data selection, on high granularity level of 15-minutes data acquisition periods.SISSAoai:inspirehep.net:17699632019
spellingShingle Nuclear Physics - Experiment
Particle Physics - Experiment
Deja, Kamil Rafal
Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments
title Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments
title_full Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments
title_fullStr Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments
title_full_unstemmed Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments
title_short Using Machine Learning techniques for Data Quality Monitoring in CMS and ALICE experiments
title_sort using machine learning techniques for data quality monitoring in cms and alice experiments
topic Nuclear Physics - Experiment
Particle Physics - Experiment
url https://dx.doi.org/10.22323/1.350.0236
http://cds.cern.ch/record/2707754
work_keys_str_mv AT dejakamilrafal usingmachinelearningtechniquesfordataqualitymonitoringincmsandaliceexperiments