Cargando…

ATLAS EventIndex general dataflow and monitoring infrastructure

The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast dataset discovery, eve...

Descripción completa

Detalles Bibliográficos
Autores principales: Fernandez Casani, Alvaro, Barberis, Dario, Favareto, Andrea, Garcia Montoro, Carlos, Gonzalez de la Hoz, Santiago, Hrivnac, Julius, Prokoshin, Fedor, Salt, Jose, Sanchez, Javier, Toebbicke, Rainer, Yuan, Ruijun
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/6/062010
http://cds.cern.ch/record/2243484
Descripción
Sumario:The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast dataset discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure.