Cargando…

High Performance Data Analysis via Coordinated Caches

With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure...

Descripción completa

Detalles Bibliográficos
Autores principales: Fischer, M, Metzlaff, C, Kühn, E, Giffels, M, Quast, G, Jung, C, Hauth, T
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/664/9/092008
http://cds.cern.ch/record/2134645
_version_ 1780949922933112832
author Fischer, M
Metzlaff, C
Kühn, E
Giffels, M
Quast, G
Jung, C
Hauth, T
author_facet Fischer, M
Metzlaff, C
Kühn, E
Giffels, M
Quast, G
Jung, C
Hauth, T
author_sort Fischer, M
collection CERN
description With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: popular data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data.
id oai-inspirehep.net-1414084
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling oai-inspirehep.net-14140842022-08-10T13:01:07Zdoi:10.1088/1742-6596/664/9/092008http://cds.cern.ch/record/2134645engFischer, MMetzlaff, CKühn, EGiffels, MQuast, GJung, CHauth, THigh Performance Data Analysis via Coordinated CachesComputing and ComputersWith the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: popular data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data.oai:inspirehep.net:14140842015
spellingShingle Computing and Computers
Fischer, M
Metzlaff, C
Kühn, E
Giffels, M
Quast, G
Jung, C
Hauth, T
High Performance Data Analysis via Coordinated Caches
title High Performance Data Analysis via Coordinated Caches
title_full High Performance Data Analysis via Coordinated Caches
title_fullStr High Performance Data Analysis via Coordinated Caches
title_full_unstemmed High Performance Data Analysis via Coordinated Caches
title_short High Performance Data Analysis via Coordinated Caches
title_sort high performance data analysis via coordinated caches
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/664/9/092008
http://cds.cern.ch/record/2134645
work_keys_str_mv AT fischerm highperformancedataanalysisviacoordinatedcaches
AT metzlaffc highperformancedataanalysisviacoordinatedcaches
AT kuhne highperformancedataanalysisviacoordinatedcaches
AT giffelsm highperformancedataanalysisviacoordinatedcaches
AT quastg highperformancedataanalysisviacoordinatedcaches
AT jungc highperformancedataanalysisviacoordinatedcaches
AT hautht highperformancedataanalysisviacoordinatedcaches