Cargando…

Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond

During the LHC Run-1 data taking, all experiments collected large data volumes from proton-proton and heavy-ion collisions. The collisions data, together with massive volumes of simulated data, were replicated in multiple copies, transferred among various Tier levels, transformed/slimmed in format/c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bonacorsi, D, Boccali, T, Giordano, D, Girone, M, Neri, M, Magini, N, Kuznetsov, V, Wildish, T
Lenguaje:	eng
Publicado:	2015
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1088/1742-6596/664/3/032003 http://cds.cern.ch/record/2134543

_version_	1780949901562085376
author	Bonacorsi, D Boccali, T Giordano, D Girone, M Neri, M Magini, N Kuznetsov, V Wildish, T
author_facet	Bonacorsi, D Boccali, T Giordano, D Girone, M Neri, M Magini, N Kuznetsov, V Wildish, T
author_sort	Bonacorsi, D
collection	CERN
description	During the LHC Run-1 data taking, all experiments collected large data volumes from proton-proton and heavy-ion collisions. The collisions data, together with massive volumes of simulated data, were replicated in multiple copies, transferred among various Tier levels, transformed/slimmed in format/content. These data were then accessed (both locally and remotely) by large groups of distributed analysis communities exploiting the WorldWide LHC Computing Grid infrastructure and services. While efficient data placement strategies - together with optimal data redistribution and deletions on demand - have become the core of static versus dynamic data management projects, little effort has so far been invested in understanding the detailed data-access patterns which surfaced in Run-1. These patterns, if understood, can be used as input to simulation of computing models at the LHC, to optimise existing systems by tuning their behaviour, and to explore next-generation CPU/storage/network co-scheduling solutions. This is of great importance, given that the scale of the computing problem will increase far faster than the resources available to the experiments, for Run-2 and beyond. Studying data-access patterns involves the validation of the quality of the monitoring data collected on the “popularity of each dataset, the analysis of the frequency and pattern of accesses to different datasets by analysis end-users, the exploration of different views of the popularity data (by physics activity, by region, by data type), the study of the evolution of Run-1 data exploitation over time, the evaluation of the impact of different data placement and distribution choices on the available network and storage resources and their impact on the computing operations. This work presents some insights from studies on the popularity data from the CMS experiment. We present the properties of a range of physics analysis activities as seen by the data popularity, and make recommendations for how to tune the initial distribution of data in anticipation of how it will be used in Run-2 and beyond.
id	oai-inspirehep.net-1413805
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2015
record_format	invenio
spelling	oai-inspirehep.net-14138052022-08-10T13:00:50Zdoi:10.1088/1742-6596/664/3/032003http://cds.cern.ch/record/2134543engBonacorsi, DBoccali, TGiordano, DGirone, MNeri, MMagini, NKuznetsov, VWildish, TExploiting CMS data popularity to model the evolution of data management for Run-2 and beyondComputing and ComputersDuring the LHC Run-1 data taking, all experiments collected large data volumes from proton-proton and heavy-ion collisions. The collisions data, together with massive volumes of simulated data, were replicated in multiple copies, transferred among various Tier levels, transformed/slimmed in format/content. These data were then accessed (both locally and remotely) by large groups of distributed analysis communities exploiting the WorldWide LHC Computing Grid infrastructure and services. While efficient data placement strategies - together with optimal data redistribution and deletions on demand - have become the core of static versus dynamic data management projects, little effort has so far been invested in understanding the detailed data-access patterns which surfaced in Run-1. These patterns, if understood, can be used as input to simulation of computing models at the LHC, to optimise existing systems by tuning their behaviour, and to explore next-generation CPU/storage/network co-scheduling solutions. This is of great importance, given that the scale of the computing problem will increase far faster than the resources available to the experiments, for Run-2 and beyond. Studying data-access patterns involves the validation of the quality of the monitoring data collected on the “popularity of each dataset, the analysis of the frequency and pattern of accesses to different datasets by analysis end-users, the exploration of different views of the popularity data (by physics activity, by region, by data type), the study of the evolution of Run-1 data exploitation over time, the evaluation of the impact of different data placement and distribution choices on the available network and storage resources and their impact on the computing operations. This work presents some insights from studies on the popularity data from the CMS experiment. We present the properties of a range of physics analysis activities as seen by the data popularity, and make recommendations for how to tune the initial distribution of data in anticipation of how it will be used in Run-2 and beyond.oai:inspirehep.net:14138052015
spellingShingle	Computing and Computers Bonacorsi, D Boccali, T Giordano, D Girone, M Neri, M Magini, N Kuznetsov, V Wildish, T Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond
title	Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond
title_full	Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond
title_fullStr	Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond
title_full_unstemmed	Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond
title_short	Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond
title_sort	exploiting cms data popularity to model the evolution of data management for run-2 and beyond
topic	Computing and Computers
url	https://dx.doi.org/10.1088/1742-6596/664/3/032003 http://cds.cern.ch/record/2134543
work_keys_str_mv	AT bonacorsid exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT boccalit exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT giordanod exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT gironem exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT nerim exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT maginin exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT kuznetsovv exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond AT wildisht exploitingcmsdatapopularitytomodeltheevolutionofdatamanagementforrun2andbeyond

Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond

Ejemplares similares