Cargando…

Monitoring performance of a highly distributed and complex computing infrastructure in LHCb

In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers...

Descripción completa

Detalles Bibliográficos
Autores principales: Mathe, Z, Haen, C, Stagni, F
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/9/092028
http://cds.cern.ch/record/2296667
_version_ 1780956908607242240
author Mathe, Z
Haen, C
Stagni, F
author_facet Mathe, Z
Haen, C
Stagni, F
author_sort Mathe, Z
collection CERN
description In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.
id oai-inspirehep.net-1638622
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling oai-inspirehep.net-16386222021-02-09T10:06:09Zdoi:10.1088/1742-6596/898/9/092028http://cds.cern.ch/record/2296667engMathe, ZHaen, CStagni, FMonitoring performance of a highly distributed and complex computing infrastructure in LHCbComputing and ComputersIn order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.oai:inspirehep.net:16386222017
spellingShingle Computing and Computers
Mathe, Z
Haen, C
Stagni, F
Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
title Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
title_full Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
title_fullStr Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
title_full_unstemmed Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
title_short Monitoring performance of a highly distributed and complex computing infrastructure in LHCb
title_sort monitoring performance of a highly distributed and complex computing infrastructure in lhcb
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/898/9/092028
http://cds.cern.ch/record/2296667
work_keys_str_mv AT mathez monitoringperformanceofahighlydistributedandcomplexcomputinginfrastructureinlhcb
AT haenc monitoringperformanceofahighlydistributedandcomplexcomputinginfrastructureinlhcb
AT stagnif monitoringperformanceofahighlydistributedandcomplexcomputinginfrastructureinlhcb