Cargando…

Analysis of CERN computing infrastructure and monitoring data

Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are al...

Descripción completa

Detalles Bibliográficos
Autores principales: Nieke, C, Lassnig, M, Menichetti, L, Motesnitsalis, E, Duellmann, D
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/664/5/052029
http://cds.cern.ch/record/2134589
_version_ 1780949911127195648
author Nieke, C
Lassnig, M
Menichetti, L
Motesnitsalis, E
Duellmann, D
author_facet Nieke, C
Lassnig, M
Menichetti, L
Motesnitsalis, E
Duellmann, D
author_sort Nieke, C
collection CERN
description Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.
id oai-inspirehep.net-1413911
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling oai-inspirehep.net-14139112022-08-10T13:00:57Zdoi:10.1088/1742-6596/664/5/052029http://cds.cern.ch/record/2134589engNieke, CLassnig, MMenichetti, LMotesnitsalis, EDuellmann, DAnalysis of CERN computing infrastructure and monitoring dataComputing and ComputersOptimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.oai:inspirehep.net:14139112015
spellingShingle Computing and Computers
Nieke, C
Lassnig, M
Menichetti, L
Motesnitsalis, E
Duellmann, D
Analysis of CERN computing infrastructure and monitoring data
title Analysis of CERN computing infrastructure and monitoring data
title_full Analysis of CERN computing infrastructure and monitoring data
title_fullStr Analysis of CERN computing infrastructure and monitoring data
title_full_unstemmed Analysis of CERN computing infrastructure and monitoring data
title_short Analysis of CERN computing infrastructure and monitoring data
title_sort analysis of cern computing infrastructure and monitoring data
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/664/5/052029
http://cds.cern.ch/record/2134589
work_keys_str_mv AT niekec analysisofcerncomputinginfrastructureandmonitoringdata
AT lassnigm analysisofcerncomputinginfrastructureandmonitoringdata
AT menichettil analysisofcerncomputinginfrastructureandmonitoringdata
AT motesnitsalise analysisofcerncomputinginfrastructureandmonitoringdata
AT duellmannd analysisofcerncomputinginfrastructureandmonitoringdata