Cargando…

LHCb: LHCb Distributed Computing Operations

The proliferation of tools for monitoring both activities and infrastructure, together with the pressing need for prompt reaction in case of problems impacting data taking, data reconstruction, data reprocessing and user analysis brought to the need of better organizing the huge amount of informatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Stagni, F, Santinelli, R
Lenguaje:eng
Publicado: 2011
Acceso en línea:http://cds.cern.ch/record/1379877
Descripción
Sumario:The proliferation of tools for monitoring both activities and infrastructure, together with the pressing need for prompt reaction in case of problems impacting data taking, data reconstruction, data reprocessing and user analysis brought to the need of better organizing the huge amount of information available. The monitoring system for the LHCb Grid Computing relies on many heterogeneous and independent sources of information offering different views for a better understanding of problems while an operations team and defined procedures have been put in place to handle them. This work summarizes the state-of-the-art of LHCb Grid operations emphasizing the reasons that brought to various choices and what are the tools currently in use to run our daily activities. We highlight the most common problems experienced across years of activities on the WLCG infrastructure, the services with their criticality, the procedures in place, the relevant metrics and the tools available and the ones still missing.