Cargando…
AwkwardForth: accelerating Uproot with an internal DSL
<!--HTML-->File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2767259 |
_version_ | 1780971284731002880 |
---|---|
author | Pivarski, Jim |
author_facet | Pivarski, Jim |
author_sort | Pivarski, Jim |
collection | CERN |
description | <!--HTML-->File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10‒80× faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code. |
id | cern-2767259 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2021 |
record_format | invenio |
spelling | cern-27672592022-11-02T22:25:37Zhttp://cds.cern.ch/record/2767259engPivarski, JimAwkwardForth: accelerating Uproot with an internal DSL25th International Conference on Computing in High Energy & Nuclear PhysicsConferences<!--HTML-->File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10‒80× faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.oai:cds.cern.ch:27672592021 |
spellingShingle | Conferences Pivarski, Jim AwkwardForth: accelerating Uproot with an internal DSL |
title | AwkwardForth: accelerating Uproot with an internal DSL |
title_full | AwkwardForth: accelerating Uproot with an internal DSL |
title_fullStr | AwkwardForth: accelerating Uproot with an internal DSL |
title_full_unstemmed | AwkwardForth: accelerating Uproot with an internal DSL |
title_short | AwkwardForth: accelerating Uproot with an internal DSL |
title_sort | awkwardforth: accelerating uproot with an internal dsl |
topic | Conferences |
url | http://cds.cern.ch/record/2767259 |
work_keys_str_mv | AT pivarskijim awkwardforthacceleratinguprootwithaninternaldsl AT pivarskijim 25thinternationalconferenceoncomputinginhighenergynuclearphysics |