Cargando…
XRootD popularity on hadoop clusters
Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the...
Autores principales: | , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/898/7/072027 http://cds.cern.ch/record/2296795 |
_version_ | 1780956905373433856 |
---|---|
author | Meoni, Marco Boccali, Tommaso Magini, Nicolò Menichetti, Luca Giordano, Domenico |
author_facet | Meoni, Marco Boccali, Tommaso Magini, Nicolò Menichetti, Luca Giordano, Domenico |
author_sort | Meoni, Marco |
collection | CERN |
description | Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based; frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations. |
id | oai-inspirehep.net-1638557 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2017 |
record_format | invenio |
spelling | oai-inspirehep.net-16385572021-02-09T10:06:26Zdoi:10.1088/1742-6596/898/7/072027http://cds.cern.ch/record/2296795engMeoni, MarcoBoccali, TommasoMagini, NicolòMenichetti, LucaGiordano, DomenicoXRootD popularity on hadoop clustersComputing and ComputersPerformance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based; frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.oai:inspirehep.net:16385572017 |
spellingShingle | Computing and Computers Meoni, Marco Boccali, Tommaso Magini, Nicolò Menichetti, Luca Giordano, Domenico XRootD popularity on hadoop clusters |
title | XRootD popularity on hadoop clusters |
title_full | XRootD popularity on hadoop clusters |
title_fullStr | XRootD popularity on hadoop clusters |
title_full_unstemmed | XRootD popularity on hadoop clusters |
title_short | XRootD popularity on hadoop clusters |
title_sort | xrootd popularity on hadoop clusters |
topic | Computing and Computers |
url | https://dx.doi.org/10.1088/1742-6596/898/7/072027 http://cds.cern.ch/record/2296795 |
work_keys_str_mv | AT meonimarco xrootdpopularityonhadoopclusters AT boccalitommaso xrootdpopularityonhadoopclusters AT magininicolo xrootdpopularityonhadoopclusters AT menichettiluca xrootdpopularityonhadoopclusters AT giordanodomenico xrootdpopularityonhadoopclusters |