Cargando…

Using the Hadoop/MapReduce approach for monitoring the CERN storage system and improving the ATLAS computing model

The processing of huge amounts of data, an already fundamental task for the research in the elementary particle physics field, is becoming more and more important also for companies operating in the Information Technology (IT) industry. In this context, if conventional approaches are adopted several...

Descripción completa

Detalles Bibliográficos
Autor principal: Russo, Stefano Alberto
Lenguaje:eng
Publicado: 2013
Materias:
Acceso en línea:http://cds.cern.ch/record/1557917
Descripción
Sumario:The processing of huge amounts of data, an already fundamental task for the research in the elementary particle physics field, is becoming more and more important also for companies operating in the Information Technology (IT) industry. In this context, if conventional approaches are adopted several problems arise, starting from the congestion of the communication channels. In the IT sector, one of the approaches designed to minimize this congestion on is to exploit the data locality, or in other words, to bring the computation as closer as possible to where the data resides. The most common implementation of this concept is the Hadoop/MapReduce framework. In this thesis work I evaluate the usage of Hadoop/MapReduce in two areas: a standard one similar to typical IT analyses, and an innovative one related to high energy physics analyses. The first consists in monitoring the history of the storage cluster which stores the data generated by the LHC experiments, the second in the physics analysis of the latter, and in particular of the data generated by the ATLAS experiment. In Chapter 2, I introduce the environment in which I have been working: the CERN, the LHC and the ATLAS experiment, while in Chapter 3 I describe the computing model of LHC experiments, giving particular attention to ATLAS. In Chapter 4, I cover the Hadoop/ MapReduce framework, together with the context in which it has been developed and the factors which has lead to a more and more growing importance of approaches centered on data locality. In Chapter 5, I present the work which I have done in the field of the monitoring of the storage cluster for the data generated by the LHC experiments, both in real time and in respect to its history, walking through the steps that have lead to adopting Hadoop/MapRedue in this contex. The Chapter 6 is the kernel of this thesis: I explain how a typical high energy physics analysis can be ported to the MapReduce model and how the entire Hadoop/MapReduce framework can be used in this field. Finally, I conclude this thesis work by testing this approach on a real case, the top quark cross section measurement analysis, which I present in Chapter 7 together with the results obtained.