Cargando…

Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN

The computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, comme...

Descripción completa

Detalles Bibliográficos
Autor principal: Blomer, Jakob
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1462821
_version_ 1780925319888240640
author Blomer, Jakob
author_facet Blomer, Jakob
author_sort Blomer, Jakob
collection CERN
description The computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in “volunteer clouds”. Unlike data, the experiment software and its operating system dependencies cannot be easily split into small chunks. Deployment of experiment software on distributed grid sites is challenging since it consists of millions of small files and changes frequently. This thesis develops a systematic approach to distribute a homogeneous runtime environment to a heterogeneous and geographically distributed computing infrastructure. A uniform bootstrap environment is provided by a minimal virtual machine tailored to LHC applications. Based on a study of the characteristics of LHC experiment software, the thesis argues for the use of content-addressable storage and decentralized caching in order to distribute the experiment software. In order to utilize the technology at the required scale, new methods of pre-processing data into content-addressable storage are developed. A co-operative, decentralized memory cache is designed that is optimized for the high peer churn expected in future virtualized computing clusters. This is achieved using a combination of consistent hashing with global knowledge about the worker nodes’ state. The methods have been implemented in the form of a file system for software and Conditions Data delivery. The file system has been widely adopted by the LHC community and the benefits of the presented methods have been demonstrated in practice.
id cern-1462821
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14628212019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462821engBlomer, JakobDecentralized Data Storage and Processing in the Context of the LHC Experiments at CERNComputing and ComputersThe computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in “volunteer clouds”. Unlike data, the experiment software and its operating system dependencies cannot be easily split into small chunks. Deployment of experiment software on distributed grid sites is challenging since it consists of millions of small files and changes frequently. This thesis develops a systematic approach to distribute a homogeneous runtime environment to a heterogeneous and geographically distributed computing infrastructure. A uniform bootstrap environment is provided by a minimal virtual machine tailored to LHC applications. Based on a study of the characteristics of LHC experiment software, the thesis argues for the use of content-addressable storage and decentralized caching in order to distribute the experiment software. In order to utilize the technology at the required scale, new methods of pre-processing data into content-addressable storage are developed. A co-operative, decentralized memory cache is designed that is optimized for the high peer churn expected in future virtualized computing clusters. This is achieved using a combination of consistent hashing with global knowledge about the worker nodes’ state. The methods have been implemented in the form of a file system for software and Conditions Data delivery. The file system has been widely adopted by the LHC community and the benefits of the presented methods have been demonstrated in practice.CERN-THESIS-2011-251oai:cds.cern.ch:14628212012-07-19T21:20:20Z
spellingShingle Computing and Computers
Blomer, Jakob
Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_full Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_fullStr Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_full_unstemmed Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_short Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_sort decentralized data storage and processing in the context of the lhc experiments at cern
topic Computing and Computers
url http://cds.cern.ch/record/1462821
work_keys_str_mv AT blomerjakob decentralizeddatastorageandprocessinginthecontextofthelhcexperimentsatcern