Cargando…

Big data solutions for CMS computing monitoring and analytics

The CMS computing infrastructure is composed of several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ariza-Porras, Christian, Kuznetsov, Valentin, Legger, Federica
Lenguaje:	eng
Publicado:	2020
Materias:	Detectors and Experimental Techniques Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1051/epjconf/202024503022 http://cds.cern.ch/record/2797456

_version_	1780972392386920448
author	Ariza-Porras, Christian Kuznetsov, Valentin Legger, Federica
author_facet	Ariza-Porras, Christian Kuznetsov, Valentin Legger, Federica
author_sort	Ariza-Porras, Christian
collection	CERN
description	The CMS computing infrastructure is composed of several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered in several sources and typically accessible only by experts. In the last year CMS computing fostered the adoption of common big data solutions based on open-source, scalable, and no-SQL tools, such as Hadoop, InfluxDB, and ElasticSearch, available through the CERN IT infrastructure. Such system allows for the easy deployment of monitoring and accounting applications using visualisation tools such as Kibana and Graphana. Alarms can be raised when anomalous conditions in the monitoring data are met, and the relevant teams are automatically notified. Data sources from different subsystems are used to build complex workflows and predictive analytics (data popularity, smart caching, transfer latency), and for performance studies. We describe the full software architecture and data flow, the CMS computing data sources and monitoring applications, and show how the stored data can be used to gain insights into the various subsystems by exploiting scalable solutions based on Spark.
id	cern-2797456
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2020
record_format	invenio
spelling	cern-27974562022-10-20T13:54:46Zdoi:10.1051/epjconf/202024503022http://cds.cern.ch/record/2797456engAriza-Porras, ChristianKuznetsov, ValentinLegger, FedericaBig data solutions for CMS computing monitoring and analyticsDetectors and Experimental TechniquesComputing and ComputersThe CMS computing infrastructure is composed of several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered in several sources and typically accessible only by experts. In the last year CMS computing fostered the adoption of common big data solutions based on open-source, scalable, and no-SQL tools, such as Hadoop, InfluxDB, and ElasticSearch, available through the CERN IT infrastructure. Such system allows for the easy deployment of monitoring and accounting applications using visualisation tools such as Kibana and Graphana. Alarms can be raised when anomalous conditions in the monitoring data are met, and the relevant teams are automatically notified. Data sources from different subsystems are used to build complex workflows and predictive analytics (data popularity, smart caching, transfer latency), and for performance studies. We describe the full software architecture and data flow, the CMS computing data sources and monitoring applications, and show how the stored data can be used to gain insights into the various subsystems by exploiting scalable solutions based on Spark.The CMS computing infrastructure is composed of several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered over several sources and typically accessible only by experts. In the last year, CMS computing fostered the adoption of common big data solutions based on open-source, scalable, and no-SQL tools, such as Hadoop, InfluxDB, and ElasticSearch, available through the CERN IT infrastructure. Such systems allow for the easy deployment of monitoring and accounting applications using visualisation tools such as Kibana and Grafana. Alarms can be raised when anomalous conditions in the monitoring data are met, and the relevant teams are automatically notified. Data sources from different subsystems are used to build complex workflows and predictive analytics (such as data popularity, smart caching, transfer latency), and for performance studies. We describe the full software architecture and data flow, the CMS computing data sources and monitoring applications, and show how the stored data can be used to gain insights into the various subsystems by exploiting scalable solutions based on Spark.CMS-CR-2020-026oai:cds.cern.ch:27974562020-01-30
spellingShingle	Detectors and Experimental Techniques Computing and Computers Ariza-Porras, Christian Kuznetsov, Valentin Legger, Federica Big data solutions for CMS computing monitoring and analytics
title	Big data solutions for CMS computing monitoring and analytics
title_full	Big data solutions for CMS computing monitoring and analytics
title_fullStr	Big data solutions for CMS computing monitoring and analytics
title_full_unstemmed	Big data solutions for CMS computing monitoring and analytics
title_short	Big data solutions for CMS computing monitoring and analytics
title_sort	big data solutions for cms computing monitoring and analytics
topic	Detectors and Experimental Techniques Computing and Computers
url	https://dx.doi.org/10.1051/epjconf/202024503022 http://cds.cern.ch/record/2797456
work_keys_str_mv	AT arizaporraschristian bigdatasolutionsforcmscomputingmonitoringandanalytics AT kuznetsovvalentin bigdatasolutionsforcmscomputingmonitoringandanalytics AT leggerfederica bigdatasolutionsforcmscomputingmonitoringandanalytics

Big data solutions for CMS computing monitoring and analytics

Ejemplares similares