Cargando…
Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
The computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, comme...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2012
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1462821 |
_version_ | 1780925319888240640 |
---|---|
author | Blomer, Jakob |
author_facet | Blomer, Jakob |
author_sort | Blomer, Jakob |
collection | CERN |
description | The computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in “volunteer clouds”. Unlike data, the experiment software and its operating system dependencies cannot be easily split into small chunks. Deployment of experiment software on distributed grid sites is challenging since it consists of millions of small files and changes frequently. This thesis develops a systematic approach to distribute a homogeneous runtime environment to a heterogeneous and geographically distributed computing infrastructure. A uniform bootstrap environment is provided by a minimal virtual machine tailored to LHC applications. Based on a study of the characteristics of LHC experiment software, the thesis argues for the use of content-addressable storage and decentralized caching in order to distribute the experiment software. In order to utilize the technology at the required scale, new methods of pre-processing data into content-addressable storage are developed. A co-operative, decentralized memory cache is designed that is optimized for the high peer churn expected in future virtualized computing clusters. This is achieved using a combination of consistent hashing with global knowledge about the worker nodes’ state. The methods have been implemented in the form of a file system for software and Conditions Data delivery. The file system has been widely adopted by the LHC community and the benefits of the presented methods have been demonstrated in practice. |
id | cern-1462821 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2012 |
record_format | invenio |
spelling | cern-14628212019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462821engBlomer, JakobDecentralized Data Storage and Processing in the Context of the LHC Experiments at CERNComputing and ComputersThe computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in “volunteer clouds”. Unlike data, the experiment software and its operating system dependencies cannot be easily split into small chunks. Deployment of experiment software on distributed grid sites is challenging since it consists of millions of small files and changes frequently. This thesis develops a systematic approach to distribute a homogeneous runtime environment to a heterogeneous and geographically distributed computing infrastructure. A uniform bootstrap environment is provided by a minimal virtual machine tailored to LHC applications. Based on a study of the characteristics of LHC experiment software, the thesis argues for the use of content-addressable storage and decentralized caching in order to distribute the experiment software. In order to utilize the technology at the required scale, new methods of pre-processing data into content-addressable storage are developed. A co-operative, decentralized memory cache is designed that is optimized for the high peer churn expected in future virtualized computing clusters. This is achieved using a combination of consistent hashing with global knowledge about the worker nodes’ state. The methods have been implemented in the form of a file system for software and Conditions Data delivery. The file system has been widely adopted by the LHC community and the benefits of the presented methods have been demonstrated in practice.CERN-THESIS-2011-251oai:cds.cern.ch:14628212012-07-19T21:20:20Z |
spellingShingle | Computing and Computers Blomer, Jakob Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN |
title | Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN |
title_full | Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN |
title_fullStr | Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN |
title_full_unstemmed | Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN |
title_short | Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN |
title_sort | decentralized data storage and processing in the context of the lhc experiments at cern |
topic | Computing and Computers |
url | http://cds.cern.ch/record/1462821 |
work_keys_str_mv | AT blomerjakob decentralizeddatastorageandprocessinginthecontextofthelhcexperimentsatcern |