Cargando…

Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study

<!--HTML--><p>With the expected large increase in the amount of available data in LHC Run 3, now more than ever HEP scientists must be able to efficiently write robust, performant analysis software that can take full advantage of the underlying hardware. Multicore computing resources are...

Descripción completa

Detalles Bibliográficos
Autores principales: Manca, Elisabetta, Guiraud, Enrico
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2694107
_version_ 1780964068695212032
author Manca, Elisabetta
Guiraud, Enrico
author_facet Manca, Elisabetta
Guiraud, Enrico
author_sort Manca, Elisabetta
collection CERN
description <!--HTML--><p>With the expected large increase in the amount of available data in LHC Run 3, now more than ever HEP scientists must be able to efficiently write robust, performant analysis software that can take full advantage of the underlying hardware. Multicore computing resources are commonplace, and current trends in scientific computing include increased availability of manycore architectures. The HEP community is not alone in this challenge: the data science industry developed solutions that we can learn from and adapt to HEP-specific problems.<br /> <br /> This is the context in which the ROOT team (and here especially Enrico) developed RDataFrame, a swiss-army knife for data manipulation that provides a high-level interface, in C++ and Python, as well as transparent optimizations such as multi-thread data parallelism. This new tool supports typical HEP workflows and data formats and it has been designed to flexibly scale up from data exploration on a laptop to analysis of millions of events exploiting hundreds of CPU cores. As a result, ROOT users can now write simpler code that runs faster. The first part of the seminar will introduce RDF, showcase its most prominent features, outline current developments and several real-world use-cases.<br /> <br /> Precision measurements are often affected by large systematic uncertainties related to the models used in simulation, and progress can be made by the extraction of features directly from data. However, the analysis of unprecedented numbers of events in a sustainable scale of time is not possible with standard techniques. The possibilities of using the ROOT RDataFrame to overcome these limitations is demonstrated within the setup of a CMS physics study in the second part of this seminar.</p>
id cern-2694107
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-26941072022-11-02T22:31:42Zhttp://cds.cern.ch/record/2694107engManca, ElisabettaGuiraud, EnricoUsing RDataFrame, ROOT’s declarative analysis tool, in a CMS physics studyUsing RDataFrame, ROOT’s declarative analysis tool, in a CMS physics studyEP Software Seminar<!--HTML--><p>With the expected large increase in the amount of available data in LHC Run 3, now more than ever HEP scientists must be able to efficiently write robust, performant analysis software that can take full advantage of the underlying hardware. Multicore computing resources are commonplace, and current trends in scientific computing include increased availability of manycore architectures. The HEP community is not alone in this challenge: the data science industry developed solutions that we can learn from and adapt to HEP-specific problems.<br /> <br /> This is the context in which the ROOT team (and here especially Enrico) developed RDataFrame, a swiss-army knife for data manipulation that provides a high-level interface, in C++ and Python, as well as transparent optimizations such as multi-thread data parallelism. This new tool supports typical HEP workflows and data formats and it has been designed to flexibly scale up from data exploration on a laptop to analysis of millions of events exploiting hundreds of CPU cores. As a result, ROOT users can now write simpler code that runs faster. The first part of the seminar will introduce RDF, showcase its most prominent features, outline current developments and several real-world use-cases.<br /> <br /> Precision measurements are often affected by large systematic uncertainties related to the models used in simulation, and progress can be made by the extraction of features directly from data. However, the analysis of unprecedented numbers of events in a sustainable scale of time is not possible with standard techniques. The possibilities of using the ROOT RDataFrame to overcome these limitations is demonstrated within the setup of a CMS physics study in the second part of this seminar.</p>oai:cds.cern.ch:26941072019
spellingShingle EP Software Seminar
Manca, Elisabetta
Guiraud, Enrico
Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study
title Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study
title_full Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study
title_fullStr Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study
title_full_unstemmed Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study
title_short Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study
title_sort using rdataframe, root’s declarative analysis tool, in a cms physics study
topic EP Software Seminar
url http://cds.cern.ch/record/2694107
work_keys_str_mv AT mancaelisabetta usingrdataframerootsdeclarativeanalysistoolinacmsphysicsstudy
AT guiraudenrico usingrdataframerootsdeclarativeanalysistoolinacmsphysicsstudy