Cargando…
Monitoring the DIRAC distributed system
DIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/219/6/062061 http://cds.cern.ch/record/1270845 |
_version_ | 1780920205882425344 |
---|---|
author | Santinelli, R Seco, M Nandakumar, R |
author_facet | Santinelli, R Seco, M Nandakumar, R |
author_sort | Santinelli, R |
collection | CERN |
description | DIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring the reliability of the infrastructure is the monitoring and logging of these DIRAC distributed systems. The monitoring is done collecting information from two sources – one is from pinging the services or by keeping track of the regular heartbeats of the agents, and the other from the analysis of the error messages generated both by agents and services and collected by a logging system. This allows us to ensure that the components are running properly and to collect useful information regarding their operations. The process status monitoring is displayed using the SLS sensor mechanism that also automatically allows to plot various quantities and keep a history of the system. A dedicated GridMap interface (ServiceMap) allows production shifters and experts to have an immediate, high-impact view of all LHCb critical services status while offering the possibility to refer to details of the SLS and SAM sensors. Error types and statistics provided by the logging service can be accessed via dedicated web interfaces on the DIRAC portal or programmatically via the python based API and CLI. |
id | cern-1270845 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2010 |
record_format | invenio |
spelling | cern-12708452022-08-17T13:25:17Zdoi:10.1088/1742-6596/219/6/062061http://cds.cern.ch/record/1270845engSantinelli, RSeco, MNandakumar, RMonitoring the DIRAC distributed systemComputing and ComputersDIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring the reliability of the infrastructure is the monitoring and logging of these DIRAC distributed systems. The monitoring is done collecting information from two sources – one is from pinging the services or by keeping track of the regular heartbeats of the agents, and the other from the analysis of the error messages generated both by agents and services and collected by a logging system. This allows us to ensure that the components are running properly and to collect useful information regarding their operations. The process status monitoring is displayed using the SLS sensor mechanism that also automatically allows to plot various quantities and keep a history of the system. A dedicated GridMap interface (ServiceMap) allows production shifters and experts to have an immediate, high-impact view of all LHCb critical services status while offering the possibility to refer to details of the SLS and SAM sensors. Error types and statistics provided by the logging service can be accessed via dedicated web interfaces on the DIRAC portal or programmatically via the python based API and CLI.oai:cds.cern.ch:12708452010 |
spellingShingle | Computing and Computers Santinelli, R Seco, M Nandakumar, R Monitoring the DIRAC distributed system |
title | Monitoring the DIRAC distributed system |
title_full | Monitoring the DIRAC distributed system |
title_fullStr | Monitoring the DIRAC distributed system |
title_full_unstemmed | Monitoring the DIRAC distributed system |
title_short | Monitoring the DIRAC distributed system |
title_sort | monitoring the dirac distributed system |
topic | Computing and Computers |
url | https://dx.doi.org/10.1088/1742-6596/219/6/062061 http://cds.cern.ch/record/1270845 |
work_keys_str_mv | AT santinellir monitoringthediracdistributedsystem AT secom monitoringthediracdistributedsystem AT nandakumarr monitoringthediracdistributedsystem |