Cargando…

Monitoring the DIRAC distributed system

DIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring...

Descripción completa

Detalles Bibliográficos
Autores principales: Santinelli, R, Seco, M, Nandakumar, R
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/219/6/062061
http://cds.cern.ch/record/1270845
_version_ 1780920205882425344
author Santinelli, R
Seco, M
Nandakumar, R
author_facet Santinelli, R
Seco, M
Nandakumar, R
author_sort Santinelli, R
collection CERN
description DIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring the reliability of the infrastructure is the monitoring and logging of these DIRAC distributed systems. The monitoring is done collecting information from two sources – one is from pinging the services or by keeping track of the regular heartbeats of the agents, and the other from the analysis of the error messages generated both by agents and services and collected by a logging system. This allows us to ensure that the components are running properly and to collect useful information regarding their operations. The process status monitoring is displayed using the SLS sensor mechanism that also automatically allows to plot various quantities and keep a history of the system. A dedicated GridMap interface (ServiceMap) allows production shifters and experts to have an immediate, high-impact view of all LHCb critical services status while offering the possibility to refer to details of the SLS and SAM sensors. Error types and statistics provided by the logging service can be accessed via dedicated web interfaces on the DIRAC portal or programmatically via the python based API and CLI.
id cern-1270845
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2010
record_format invenio
spelling cern-12708452022-08-17T13:25:17Zdoi:10.1088/1742-6596/219/6/062061http://cds.cern.ch/record/1270845engSantinelli, RSeco, MNandakumar, RMonitoring the DIRAC distributed systemComputing and ComputersDIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring the reliability of the infrastructure is the monitoring and logging of these DIRAC distributed systems. The monitoring is done collecting information from two sources – one is from pinging the services or by keeping track of the regular heartbeats of the agents, and the other from the analysis of the error messages generated both by agents and services and collected by a logging system. This allows us to ensure that the components are running properly and to collect useful information regarding their operations. The process status monitoring is displayed using the SLS sensor mechanism that also automatically allows to plot various quantities and keep a history of the system. A dedicated GridMap interface (ServiceMap) allows production shifters and experts to have an immediate, high-impact view of all LHCb critical services status while offering the possibility to refer to details of the SLS and SAM sensors. Error types and statistics provided by the logging service can be accessed via dedicated web interfaces on the DIRAC portal or programmatically via the python based API and CLI.oai:cds.cern.ch:12708452010
spellingShingle Computing and Computers
Santinelli, R
Seco, M
Nandakumar, R
Monitoring the DIRAC distributed system
title Monitoring the DIRAC distributed system
title_full Monitoring the DIRAC distributed system
title_fullStr Monitoring the DIRAC distributed system
title_full_unstemmed Monitoring the DIRAC distributed system
title_short Monitoring the DIRAC distributed system
title_sort monitoring the dirac distributed system
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/219/6/062061
http://cds.cern.ch/record/1270845
work_keys_str_mv AT santinellir monitoringthediracdistributedsystem
AT secom monitoringthediracdistributedsystem
AT nandakumarr monitoringthediracdistributedsystem