Cargando…

Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model

The processing of huge amounts of data, an already fundamental task for the research in the elementary particle physics field, is becoming more and more important also for companies operating in the Information Technology (IT) industry. In this context, if conventional approaches are adopted several...

Descripción completa

Detalles Bibliográficos
Autor principal:	Russo, Stefano Alberto
Lenguaje:	eng
Publicado:	2013
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1557917

_version_	1780930559466274816
author	Russo, Stefano Alberto
author_facet	Russo, Stefano Alberto
author_sort	Russo, Stefano Alberto
collection	CERN
description	The processing of huge amounts of data, an already fundamental task for the research in the elementary particle physics field, is becoming more and more important also for companies operating in the Information Technology (IT) industry. In this context, if conventional approaches are adopted several problems arise, starting from the congestion of the communication channels. In the IT sector, one of the approaches designed to minimize this congestion on is to exploit the data locality, or in other words, to bring the computation as closer as possible to where the data resides. The most common implementation of this concept is the Hadoop/MapReduce framework. In this thesis work I evaluate the usage of Hadoop/MapReduce in two areas: a standard one similar to typical IT analyses, and an innovative one related to high energy physics analyses. The first consists in monitoring the history of the storage cluster which stores the data generated by the LHC experiments, the second in the physics analysis of the latter, and in particular of the data generated by the ATLAS experiment. In Chapter 2, I introduce the environment in which I have been working: the CERN, the LHC and the ATLAS experiment, while in Chapter 3 I describe the computing model of LHC experiments, giving particular attention to ATLAS. In Chapter 4, I cover the Hadoop/ MapReduce framework, together with the context in which it has been developed and the factors which has lead to a more and more growing importance of approaches centered on data locality. In Chapter 5, I present the work which I have done in the field of the monitoring of the storage cluster for the data generated by the LHC experiments, both in real time and in respect to its history, walking through the steps that have lead to adopting Hadoop/MapRedue in this contex. The Chapter 6 is the kernel of this thesis: I explain how a typical high energy physics analysis can be ported to the MapReduce model and how the entire Hadoop/MapReduce framework can be used in this field. Finally, I conclude this thesis work by testing this approach on a real case, the top quark cross section measurement analysis, which I present in Chapter 7 together with the results obtained.
id	cern-1557917
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2013
record_format	invenio
spelling	cern-15579172019-09-30T06:29:59Zhttp://cds.cern.ch/record/1557917engRusso, Stefano AlbertoUsing the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing modelComputing and ComputersThe processing of huge amounts of data, an already fundamental task for the research in the elementary particle physics field, is becoming more and more important also for companies operating in the Information Technology (IT) industry. In this context, if conventional approaches are adopted several problems arise, starting from the congestion of the communication channels. In the IT sector, one of the approaches designed to minimize this congestion on is to exploit the data locality, or in other words, to bring the computation as closer as possible to where the data resides. The most common implementation of this concept is the Hadoop/MapReduce framework. In this thesis work I evaluate the usage of Hadoop/MapReduce in two areas: a standard one similar to typical IT analyses, and an innovative one related to high energy physics analyses. The first consists in monitoring the history of the storage cluster which stores the data generated by the LHC experiments, the second in the physics analysis of the latter, and in particular of the data generated by the ATLAS experiment. In Chapter 2, I introduce the environment in which I have been working: the CERN, the LHC and the ATLAS experiment, while in Chapter 3 I describe the computing model of LHC experiments, giving particular attention to ATLAS. In Chapter 4, I cover the Hadoop/ MapReduce framework, together with the context in which it has been developed and the factors which has lead to a more and more growing importance of approaches centered on data locality. In Chapter 5, I present the work which I have done in the field of the monitoring of the storage cluster for the data generated by the LHC experiments, both in real time and in respect to its history, walking through the steps that have lead to adopting Hadoop/MapRedue in this contex. The Chapter 6 is the kernel of this thesis: I explain how a typical high energy physics analysis can be ported to the MapReduce model and how the entire Hadoop/MapReduce framework can be used in this field. Finally, I conclude this thesis work by testing this approach on a real case, the top quark cross section measurement analysis, which I present in Chapter 7 together with the results obtained.CERN-THESIS-2013-067oai:cds.cern.ch:15579172013-06-25T17:16:56Z
spellingShingle	Computing and Computers Russo, Stefano Alberto Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model
title	Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model
title_full	Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model
title_fullStr	Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model
title_full_unstemmed	Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model
title_short	Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model
title_sort	using the hadoop/mapreduce approach for monitoring the cern storage system and improving the atlas computing model
topic	Computing and Computers
url	http://cds.cern.ch/record/1557917
work_keys_str_mv	AT russostefanoalberto usingthehadoopmapreduceapproachformonitoringthecernstoragesystemandimprovingtheatlascomputingmodel

Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model

Ejemplares similares