Cargando…

AwkwardForth: accelerating Uproot with an internal DSL

<!--HTML-->File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a...

Descripción completa

Detalles Bibliográficos
Autor principal: Pivarski, Jim
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2767259
_version_ 1780971284731002880
author Pivarski, Jim
author_facet Pivarski, Jim
author_sort Pivarski, Jim
collection CERN
description <!--HTML-->File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10‒80× faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.
id cern-2767259
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27672592022-11-02T22:25:37Zhttp://cds.cern.ch/record/2767259engPivarski, JimAwkwardForth: accelerating Uproot with an internal DSL25th International Conference on Computing in High Energy & Nuclear PhysicsConferences<!--HTML-->File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10‒80× faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.oai:cds.cern.ch:27672592021
spellingShingle Conferences
Pivarski, Jim
AwkwardForth: accelerating Uproot with an internal DSL
title AwkwardForth: accelerating Uproot with an internal DSL
title_full AwkwardForth: accelerating Uproot with an internal DSL
title_fullStr AwkwardForth: accelerating Uproot with an internal DSL
title_full_unstemmed AwkwardForth: accelerating Uproot with an internal DSL
title_short AwkwardForth: accelerating Uproot with an internal DSL
title_sort awkwardforth: accelerating uproot with an internal dsl
topic Conferences
url http://cds.cern.ch/record/2767259
work_keys_str_mv AT pivarskijim awkwardforthacceleratinguprootwithaninternaldsl
AT pivarskijim 25thinternationalconferenceoncomputinginhighenergynuclearphysics