Cargando…

Novel functional and distributed approaches to data analysis available in ROOT

The bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analysing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit paral...

Descripción completa

Detalles Bibliográficos
Autores principales: Amadio, G, Blomer, J, Canal, P, Ganis, G, Guiraud, E, Mato Vila, P, Moneta, L, Piparo, D, Tejedor, E, Valls Pla, X
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/1085/4/042008
http://cds.cern.ch/record/2664841
_version_ 1780961951369658368
author Amadio, G
Blomer, J
Canal, P
Ganis, G
Guiraud, E
Mato Vila, P
Moneta, L
Piparo, D
Tejedor, E
Valls Pla, X
author_facet Amadio, G
Blomer, J
Canal, P
Ganis, G
Guiraud, E
Mato Vila, P
Moneta, L
Piparo, D
Tejedor, E
Valls Pla, X
author_sort Amadio, G
collection CERN
description The bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analysing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit parallelism to cope with the ever increasing complexity and size of the datasets. This contribution describes how the ROOT framework, a cornerstone of software stacks dedicated to particle physics, is preparing to provide adequate solutions for the analysis of large amount of scientific data on parallel architectures. The functional approach to parallel data analysis provided with the ROOT TDataFrame interface is then characterised. The design choices behind this new interface are described also comparing with other widely adopted tools such as Pandas and Apache Spark. The programming model is illustrated highlighting the reduction of boilerplate code, composability of the actions and data transformations as well as the capabilities of dealing with different data sources such as ROOT, JSON, CSV or databases. Details are given about how the functional approach allows transparent implicit parallelisation of the chain of operations specified by the user. The progress done in the field of distributed analysis is examined. In particular, the power of the integration of ROOT with Apache Spark via the PyROOT interface is shown. In addition, the building blocks for the expression of parallelism in ROOT are briefly characterised together with the structural changes applied in the building and testing infrastructure which were necessary to put them in production.
id oai-inspirehep.net-1699879
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling oai-inspirehep.net-16998792021-02-09T10:05:27Zdoi:10.1088/1742-6596/1085/4/042008http://cds.cern.ch/record/2664841engAmadio, GBlomer, JCanal, PGanis, GGuiraud, EMato Vila, PMoneta, LPiparo, DTejedor, EValls Pla, XNovel functional and distributed approaches to data analysis available in ROOTComputing and ComputersThe bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analysing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit parallelism to cope with the ever increasing complexity and size of the datasets. This contribution describes how the ROOT framework, a cornerstone of software stacks dedicated to particle physics, is preparing to provide adequate solutions for the analysis of large amount of scientific data on parallel architectures. The functional approach to parallel data analysis provided with the ROOT TDataFrame interface is then characterised. The design choices behind this new interface are described also comparing with other widely adopted tools such as Pandas and Apache Spark. The programming model is illustrated highlighting the reduction of boilerplate code, composability of the actions and data transformations as well as the capabilities of dealing with different data sources such as ROOT, JSON, CSV or databases. Details are given about how the functional approach allows transparent implicit parallelisation of the chain of operations specified by the user. The progress done in the field of distributed analysis is examined. In particular, the power of the integration of ROOT with Apache Spark via the PyROOT interface is shown. In addition, the building blocks for the expression of parallelism in ROOT are briefly characterised together with the structural changes applied in the building and testing infrastructure which were necessary to put them in production.oai:inspirehep.net:16998792018
spellingShingle Computing and Computers
Amadio, G
Blomer, J
Canal, P
Ganis, G
Guiraud, E
Mato Vila, P
Moneta, L
Piparo, D
Tejedor, E
Valls Pla, X
Novel functional and distributed approaches to data analysis available in ROOT
title Novel functional and distributed approaches to data analysis available in ROOT
title_full Novel functional and distributed approaches to data analysis available in ROOT
title_fullStr Novel functional and distributed approaches to data analysis available in ROOT
title_full_unstemmed Novel functional and distributed approaches to data analysis available in ROOT
title_short Novel functional and distributed approaches to data analysis available in ROOT
title_sort novel functional and distributed approaches to data analysis available in root
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/1085/4/042008
http://cds.cern.ch/record/2664841
work_keys_str_mv AT amadiog novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT blomerj novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT canalp novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT ganisg novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT guiraude novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT matovilap novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT monetal novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT piparod novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT tejedore novelfunctionalanddistributedapproachestodataanalysisavailableinroot
AT vallsplax novelfunctionalanddistributedapproachestodataanalysisavailableinroot