Cargando…

Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN

The computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, comme...

Descripción completa

Detalles Bibliográficos
Autor principal:	Blomer, Jakob
Lenguaje:	eng
Publicado:	2012
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1462821

_version_	1780925319888240640
author	Blomer, Jakob
author_facet	Blomer, Jakob
author_sort	Blomer, Jakob
collection	CERN
description	The computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in “volunteer clouds”. Unlike data, the experiment software and its operating system dependencies cannot be easily split into small chunks. Deployment of experiment software on distributed grid sites is challenging since it consists of millions of small files and changes frequently. This thesis develops a systematic approach to distribute a homogeneous runtime environment to a heterogeneous and geographically distributed computing infrastructure. A uniform bootstrap environment is provided by a minimal virtual machine tailored to LHC applications. Based on a study of the characteristics of LHC experiment software, the thesis argues for the use of content-addressable storage and decentralized caching in order to distribute the experiment software. In order to utilize the technology at the required scale, new methods of pre-processing data into content-addressable storage are developed. A co-operative, decentralized memory cache is designed that is optimized for the high peer churn expected in future virtualized computing clusters. This is achieved using a combination of consistent hashing with global knowledge about the worker nodes’ state. The methods have been implemented in the form of a file system for software and Conditions Data delivery. The file system has been widely adopted by the LHC community and the benefits of the presented methods have been demonstrated in practice.
id	cern-1462821
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2012
record_format	invenio
spelling	cern-14628212019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462821engBlomer, JakobDecentralized Data Storage and Processing in the Context of the LHC Experiments at CERNComputing and ComputersThe computing facilities used to process data for the experiments at the Large Hadron Collider (LHC) at CERN are scattered around the world. The embarrassingly parallel workload allows for use of various computing resources, such as computer centers comprising the Worldwide LHC Computing Grid, commercial and institutional cloud resources, as well as individual home PCs in “volunteer clouds”. Unlike data, the experiment software and its operating system dependencies cannot be easily split into small chunks. Deployment of experiment software on distributed grid sites is challenging since it consists of millions of small files and changes frequently. This thesis develops a systematic approach to distribute a homogeneous runtime environment to a heterogeneous and geographically distributed computing infrastructure. A uniform bootstrap environment is provided by a minimal virtual machine tailored to LHC applications. Based on a study of the characteristics of LHC experiment software, the thesis argues for the use of content-addressable storage and decentralized caching in order to distribute the experiment software. In order to utilize the technology at the required scale, new methods of pre-processing data into content-addressable storage are developed. A co-operative, decentralized memory cache is designed that is optimized for the high peer churn expected in future virtualized computing clusters. This is achieved using a combination of consistent hashing with global knowledge about the worker nodes’ state. The methods have been implemented in the form of a file system for software and Conditions Data delivery. The file system has been widely adopted by the LHC community and the benefits of the presented methods have been demonstrated in practice.CERN-THESIS-2011-251oai:cds.cern.ch:14628212012-07-19T21:20:20Z
spellingShingle	Computing and Computers Blomer, Jakob Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title	Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_full	Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_fullStr	Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_full_unstemmed	Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_short	Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN
title_sort	decentralized data storage and processing in the context of the lhc experiments at cern
topic	Computing and Computers
url	http://cds.cern.ch/record/1462821
work_keys_str_mv	AT blomerjakob decentralizeddatastorageandprocessinginthecontextofthelhcexperimentsatcern

Decentralized Data Storage and Processing in the Context of the LHC Experiments at CERN

Ejemplares similares