Cargando…

The Error Reporting in the ATLAS TDAQ system

The ATLAS Error Reporting feature, which is used in the TDAQ environment, provides a service that allows experts and shift crew to track and address errors relating to the data taking components and applications. This service, called the Error Reporting Service(ERS), gives software applications the...

Descripción completa

Detalles Bibliográficos
Autores principales: Kolos, S, Kazarov, A, Papaevgeniou, L
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:http://cds.cern.ch/record/1751460
Descripción
Sumario:The ATLAS Error Reporting feature, which is used in the TDAQ environment, provides a service that allows experts and shift crew to track and address errors relating to the data taking components and applications. This service, called the Error Reporting Service(ERS), gives software applications the opportunity to collect and send comprehensive data about errors, happening at run-time, to a place where it can be intercepted in real-time by any other system component. Other ATLAS online control and monitoring tools use the Error Reporting service as one of their main inputs to address system problems in a timely manner and to improve the quality of acquired data. The actual destination of the error messages depends solely on the run-time environment, in which the online applications are operating. When applications send information to ERS, depending on the actual configuration the information may end up in a local file, in a database, in distributed middle-ware, which can transport it to an expert system or display it to a users, who can work around a problem. Thanks to the open framework design of ERS, new information destinations can be added at any moment without touching the reporting and receiving applications. The ERS API is provided in three programming languages used in the ATLAS online environment: C++, Java and Python. All APIs use exceptions for error reporting but each of them exploits advanced features of a given language to simplify program writing experience. For the example, as C++ lacks language support for exceptions, a special macro have been designed to generate hierarchies of C++ exception classes at compile time. Using this approach a software developer can write a single line of code to generate a boilerplate code for a fully qualified C++ exception class declaration with arbitrary number of parameters and multiple constructors, which encapsulates all relevant static information about the given type of issues. When corresponding error occurs at run time, a program just need to create an instance of that class passing relevant values to one of the available class constructors and send this instance to ERS. This paper presents the original design solutions exploited for the ERS implementation and describes the experience of using ERS for the first ATLAS run period, where the cross-system error reporting standardization, introduced by ERS, was one of the key points for successful launching and utilization of automated problem-solving solutions in the TDAQ online environment.