Cargando…

Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches

Scientific research has always been intertwined to a certain degree with Computing. Even more so over the last few years, during which the needs for resources in terms of storage and processing power have increased exponentially. This holds true for many different joint collaborations in fields such...

Descripción completa

Detalles Bibliográficos
Autor principal: Padulano, Vincenzo Eduardo
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2693575
_version_ 1780964035777265664
author Padulano, Vincenzo Eduardo
author_facet Padulano, Vincenzo Eduardo
author_sort Padulano, Vincenzo Eduardo
collection CERN
description Scientific research has always been intertwined to a certain degree with Computing. Even more so over the last few years, during which the needs for resources in terms of storage and processing power have increased exponentially. This holds true for many different joint collaborations in fields such as biology, medicine, earth sciences, physics and astrophysics, among which CERN definitely represents a notable example. Being the largest centre for research in the High Energy Physics (HEP) field, it has always kept pushing for new discoveries in its executive program. The collaborative efforts of thousands of scientists worldwide have led to important results, most notably in recent years the discovery of the Higgs boson, officially announced in 2012 by the researchers at CMS and ATLAS, the two main experiments taking data at the LHC collider. This strenuous work demands the most advanced technological instruments to recreate the physics events and at the same time hardware and software that keep up with the computing needs. But while HEP has been historically at the forefront in developing solutions to cope up with these requirements, in the recent years other fields and industries have experienced steady advances, helped by an unprecedented abundance of data. The research field born to exploit data, namely Data Science, has brought to the table new computing techniques that may well fit the needs of HEP. In this thesis, a programming model commonly used in Data Science, namely MapReduce, will be exploited to work with the most prominent software for HEP analysis, ROOT. The first will be used in the implementation available under the Apache Spark framework to allow for distributing computations over a remote cluster, while the latter will provide the interface to common HEP data formats and analysis models through one of its latest additions, namely RDataFrame. PyRDF, a purposely developed package, will glue all the components together and will be used to showcase how this new model can affect the workflow of a physics analysis.
id cern-2693575
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-26935752019-11-25T15:31:15Zhttp://cds.cern.ch/record/2693575engPadulano, Vincenzo EduardoBlurring High Energy Physics Data Analysis Techniques and Data Science ApproachesComputing and ComputersScientific research has always been intertwined to a certain degree with Computing. Even more so over the last few years, during which the needs for resources in terms of storage and processing power have increased exponentially. This holds true for many different joint collaborations in fields such as biology, medicine, earth sciences, physics and astrophysics, among which CERN definitely represents a notable example. Being the largest centre for research in the High Energy Physics (HEP) field, it has always kept pushing for new discoveries in its executive program. The collaborative efforts of thousands of scientists worldwide have led to important results, most notably in recent years the discovery of the Higgs boson, officially announced in 2012 by the researchers at CMS and ATLAS, the two main experiments taking data at the LHC collider. This strenuous work demands the most advanced technological instruments to recreate the physics events and at the same time hardware and software that keep up with the computing needs. But while HEP has been historically at the forefront in developing solutions to cope up with these requirements, in the recent years other fields and industries have experienced steady advances, helped by an unprecedented abundance of data. The research field born to exploit data, namely Data Science, has brought to the table new computing techniques that may well fit the needs of HEP. In this thesis, a programming model commonly used in Data Science, namely MapReduce, will be exploited to work with the most prominent software for HEP analysis, ROOT. The first will be used in the implementation available under the Apache Spark framework to allow for distributing computations over a remote cluster, while the latter will provide the interface to common HEP data formats and analysis models through one of its latest additions, namely RDataFrame. PyRDF, a purposely developed package, will glue all the components together and will be used to showcase how this new model can affect the workflow of a physics analysis.CERN-THESIS-2019-161oai:cds.cern.ch:26935752019-10-15T09:20:52Z
spellingShingle Computing and Computers
Padulano, Vincenzo Eduardo
Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches
title Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches
title_full Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches
title_fullStr Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches
title_full_unstemmed Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches
title_short Blurring High Energy Physics Data Analysis Techniques and Data Science Approaches
title_sort blurring high energy physics data analysis techniques and data science approaches
topic Computing and Computers
url http://cds.cern.ch/record/2693575
work_keys_str_mv AT padulanovincenzoeduardo blurringhighenergyphysicsdataanalysistechniquesanddatascienceapproaches