Cargando…

Exploiting Big Data solutions for CMS computing operations analytics

Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level. A close and detailed study of the exploited computing systems for the LHC p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gasperini, Simone, Rossi Tisbeni, Simone, Bonacorsi, Daniele, Lange, David
Lenguaje:	eng
Publicado:	2022
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.22323/1.415.0006 http://cds.cern.ch/record/2861074

_version_	1780977794272985088
author	Gasperini, Simone Rossi Tisbeni, Simone Bonacorsi, Daniele Lange, David
author_facet	Gasperini, Simone Rossi Tisbeni, Simone Bonacorsi, Daniele Lange, David
author_sort	Gasperini, Simone
collection	CERN
description	Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level. A close and detailed study of the exploited computing systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of High Energy Physics (HEP) towards the exascale regime. In this context, the Compact Muon Solenoid (CMS) experiment has been collecting and storing over the last few years a large set of heterogeneous non-collision data (e.g. meta-data about replicas placement, transfer operations, and actual user access to physics datasets). All this data richness is currently residing on a distributed Hadoop cluster, and it is organized so that running fast and arbitrary queries using the Spark analytics framework is a viable approach for Big Data mining efforts. Using a data-driven approach oriented to the analysis of this meta-data deriving from several CMS computing services, such as DBS (Data Bookkeeping Service) and MCM (Monte Carlo Management system), we started to focus on data storage and data access over the WLCG infrastructure, and we drafted an embryonal software toolkit to investigate recurrent patterns and provide indicators about physics datasets popularity. As a long-term goal, this aims at contributing to the overall design of a predictive/adaptive system that would eventually reduce costs and complexity of the CMS computing operations, while taking into account the stringent requests by the physics analysts community.
id	cern-2861074
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2022
record_format	invenio
spelling	cern-28610742023-06-16T09:28:15Zdoi:10.22323/1.415.0006http://cds.cern.ch/record/2861074engGasperini, SimoneRossi Tisbeni, SimoneBonacorsi, DanieleLange, DavidExploiting Big Data solutions for CMS computing operations analyticsComputing and ComputersComputing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level. A close and detailed study of the exploited computing systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of High Energy Physics (HEP) towards the exascale regime. In this context, the Compact Muon Solenoid (CMS) experiment has been collecting and storing over the last few years a large set of heterogeneous non-collision data (e.g. meta-data about replicas placement, transfer operations, and actual user access to physics datasets). All this data richness is currently residing on a distributed Hadoop cluster, and it is organized so that running fast and arbitrary queries using the Spark analytics framework is a viable approach for Big Data mining efforts. Using a data-driven approach oriented to the analysis of this meta-data deriving from several CMS computing services, such as DBS (Data Bookkeeping Service) and MCM (Monte Carlo Management system), we started to focus on data storage and data access over the WLCG infrastructure, and we drafted an embryonal software toolkit to investigate recurrent patterns and provide indicators about physics datasets popularity. As a long-term goal, this aims at contributing to the overall design of a predictive/adaptive system that would eventually reduce costs and complexity of the CMS computing operations, while taking into account the stringent requests by the physics analysts community.oai:cds.cern.ch:28610742022
spellingShingle	Computing and Computers Gasperini, Simone Rossi Tisbeni, Simone Bonacorsi, Daniele Lange, David Exploiting Big Data solutions for CMS computing operations analytics
title	Exploiting Big Data solutions for CMS computing operations analytics
title_full	Exploiting Big Data solutions for CMS computing operations analytics
title_fullStr	Exploiting Big Data solutions for CMS computing operations analytics
title_full_unstemmed	Exploiting Big Data solutions for CMS computing operations analytics
title_short	Exploiting Big Data solutions for CMS computing operations analytics
title_sort	exploiting big data solutions for cms computing operations analytics
topic	Computing and Computers
url	https://dx.doi.org/10.22323/1.415.0006 http://cds.cern.ch/record/2861074
work_keys_str_mv	AT gasperinisimone exploitingbigdatasolutionsforcmscomputingoperationsanalytics AT rossitisbenisimone exploitingbigdatasolutionsforcmscomputingoperationsanalytics AT bonacorsidaniele exploitingbigdatasolutionsforcmscomputingoperationsanalytics AT langedavid exploitingbigdatasolutionsforcmscomputingoperationsanalytics

Exploiting Big Data solutions for CMS computing operations analytics

Ejemplares similares