Cargando…

Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data

Query languages for High Energy Physics (HEP) are an ever present topic within the field. A query language that can efficiently represent the nested data structures that encode the statistical and physical meaning of HEP data will help analysts by ensuring their code is more clear and pertinent. As...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Nicholas Charles, Gray, Lindsey Andrew
Lenguaje:eng
Publicado: 2022
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/2438/1/012033
http://cds.cern.ch/record/2806236
_version_ 1780972982334652416
author Smith, Nicholas Charles
Gray, Lindsey Andrew
author_facet Smith, Nicholas Charles
Gray, Lindsey Andrew
author_sort Smith, Nicholas Charles
collection CERN
description Query languages for High Energy Physics (HEP) are an ever present topic within the field. A query language that can efficiently represent the nested data structures that encode the statistical and physical meaning of HEP data will help analysts by ensuring their code is more clear and pertinent. As the result of a multi-year effort to develop an in-memory columnar representation of high energy physics data, the numpy, awkward arrays, and uproot python packages present a mature and efficient interface to HEP data. Atop that base, the coffea package adds functionality to launch queries at scale, manage and apply experiment-specific transformations to data, and present a rich object-oriented columnar data representation to the analyst. Recently, a set of Analysis Description Language (ADL) benchmarks has been established to compare HEP queries in multiple languages and frameworks. In this paper we present these benchmark queries implemented within the coffea framework and discuss their readability and performance characteristics. We find that the columnar queries perform as well or better than the implementations given in previous studies.
id cern-2806236
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2022
record_format invenio
spelling cern-28062362023-08-23T08:46:06Zdoi:10.1088/1742-6596/2438/1/012033http://cds.cern.ch/record/2806236engSmith, Nicholas CharlesGray, Lindsey AndrewEvaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics dataDetectors and Experimental TechniquesQuery languages for High Energy Physics (HEP) are an ever present topic within the field. A query language that can efficiently represent the nested data structures that encode the statistical and physical meaning of HEP data will help analysts by ensuring their code is more clear and pertinent. As the result of a multi-year effort to develop an in-memory columnar representation of high energy physics data, the numpy, awkward arrays, and uproot python packages present a mature and efficient interface to HEP data. Atop that base, the coffea package adds functionality to launch queries at scale, manage and apply experiment-specific transformations to data, and present a rich object-oriented columnar data representation to the analyst. Recently, a set of Analysis Description Language (ADL) benchmarks has been established to compare HEP queries in multiple languages and frameworks. In this paper we present these benchmark queries implemented within the coffea framework and discuss their readability and performance characteristics. We find that the columnar queries perform as well or better than the implementations given in previous studies.CMS-CR-2022-041oai:cds.cern.ch:28062362022-03-02
spellingShingle Detectors and Experimental Techniques
Smith, Nicholas Charles
Gray, Lindsey Andrew
Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data
title Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data
title_full Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data
title_fullStr Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data
title_full_unstemmed Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data
title_short Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data
title_sort evaluating awkward arrays, uproot, and coffea as a query platform for high energy physics data
topic Detectors and Experimental Techniques
url https://dx.doi.org/10.1088/1742-6596/2438/1/012033
http://cds.cern.ch/record/2806236
work_keys_str_mv AT smithnicholascharles evaluatingawkwardarraysuprootandcoffeaasaqueryplatformforhighenergyphysicsdata
AT graylindseyandrew evaluatingawkwardarraysuprootandcoffeaasaqueryplatformforhighenergyphysicsdata