Cargando…

Popularity framework to process dataset tracers and its application on dynamic replica reduction in the ATLAS experiment

The ATLAS experiment's data management system is constantly tracing file movement operations that occur on the Worldwide LHC Computing Grid (WLCG). Due to the large scale of the WLCG, direct statistical analysis of the traces is impossible in real-time. Factors that contribute to the scalabilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Molfetas, A, Megino, F, Tykhonov, A, Garonne, V, Campana, S, Lassnig, M, Barisits, M, Dimitrov, G, Viegas, F
Lenguaje:eng
Publicado: 2011
Materias:
Acceso en línea:http://cds.cern.ch/record/1359245
Descripción
Sumario:The ATLAS experiment's data management system is constantly tracing file movement operations that occur on the Worldwide LHC Computing Grid (WLCG). Due to the large scale of the WLCG, direct statistical analysis of the traces is impossible in real-time. Factors that contribute to the scalability problems include the capability for users to initiatiate on-demand queries, high dimensionality of tracer entries combined with very low cardinality parameters, the large size of the namespace as well as rapid rate of file transactions occuring on the Grid. These scalability issues are alleviated through the adoption of an incremental model that aggregates data for all combinations occurring in selected tracer fields on a daily basis. Using this model it is possible to query on-demand relevant statistics about system usage. We present an implementation of this popularity model in the experiment's distributed data management system, DQ2, and describe a direct application example of the popularity framework, an automated cleaning system, which uses the statistics to dynamically detect and reduce unpopular replicas from grid sites. This paper describes the architecture employed by the cleaning system and reports on the results collected from a prototype during the first months of the ATLAS detector operation.