Cargando…

XRootD popularity on hadoop clusters

Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the...

Descripción completa

Detalles Bibliográficos
Autores principales: Meoni, Marco, Boccali, Tommaso, Magini, Nicolò, Menichetti, Luca, Giordano, Domenico
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/7/072027
http://cds.cern.ch/record/2296795
_version_ 1780956905373433856
author Meoni, Marco
Boccali, Tommaso
Magini, Nicolò
Menichetti, Luca
Giordano, Domenico
author_facet Meoni, Marco
Boccali, Tommaso
Magini, Nicolò
Menichetti, Luca
Giordano, Domenico
author_sort Meoni, Marco
collection CERN
description Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based; frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.
id oai-inspirehep.net-1638557
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling oai-inspirehep.net-16385572021-02-09T10:06:26Zdoi:10.1088/1742-6596/898/7/072027http://cds.cern.ch/record/2296795engMeoni, MarcoBoccali, TommasoMagini, NicolòMenichetti, LucaGiordano, DomenicoXRootD popularity on hadoop clustersComputing and ComputersPerformance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based; frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.oai:inspirehep.net:16385572017
spellingShingle Computing and Computers
Meoni, Marco
Boccali, Tommaso
Magini, Nicolò
Menichetti, Luca
Giordano, Domenico
XRootD popularity on hadoop clusters
title XRootD popularity on hadoop clusters
title_full XRootD popularity on hadoop clusters
title_fullStr XRootD popularity on hadoop clusters
title_full_unstemmed XRootD popularity on hadoop clusters
title_short XRootD popularity on hadoop clusters
title_sort xrootd popularity on hadoop clusters
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/898/7/072027
http://cds.cern.ch/record/2296795
work_keys_str_mv AT meonimarco xrootdpopularityonhadoopclusters
AT boccalitommaso xrootdpopularityonhadoopclusters
AT magininicolo xrootdpopularityonhadoopclusters
AT menichettiluca xrootdpopularityonhadoopclusters
AT giordanodomenico xrootdpopularityonhadoopclusters