Cargando…

High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study

The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data...

Descripción completa

Detalles Bibliográficos
Autores principales: Grigoryeva, Maria, Tretyakov, Eugeny, Artamonov, Aleksei, Klimentov, Alexei, Golubkov, Dmitry, Korchuganova, Tatiana, Alekseev, Aleksandr, Galkin, Timofei
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:http://cds.cern.ch/record/2730116
Descripción
Sumario:The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used duringLHC Run-1 and Run-2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027-2036)an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allow sthe correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks.