Cargando…

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, result...

Descripción completa

Detalles Bibliográficos
Autores principales: Pappalardo, Alessandro, Umuroglu, Yaman, Blott, Michaela, Mitrevski, Jovan, Hawks, Ben, Tran, Nhan, Loncar, Vladimir, Summers, Sioni, Borras, Hendrik, Muhizi, Jules, Trahms, Matthew, Hsu, Shih-Chieh, Hauck, Scott, Duarte, Javier
Lenguaje:eng
Publicado: 2022
Materias:
Acceso en línea:http://cds.cern.ch/record/2813346
_version_ 1780973401110740992
author Pappalardo, Alessandro
Umuroglu, Yaman
Blott, Michaela
Mitrevski, Jovan
Hawks, Ben
Tran, Nhan
Loncar, Vladimir
Summers, Sioni
Borras, Hendrik
Muhizi, Jules
Trahms, Matthew
Hsu, Shih-Chieh
Hauck, Scott
Duarte, Javier
author_facet Pappalardo, Alessandro
Umuroglu, Yaman
Blott, Michaela
Mitrevski, Jovan
Hawks, Ben
Tran, Nhan
Loncar, Vladimir
Summers, Sioni
Borras, Hendrik
Muhizi, Jules
Trahms, Matthew
Hsu, Shih-Chieh
Hauck, Scott
Duarte, Javier
author_sort Pappalardo, Alessandro
collection CERN
description We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.
id cern-2813346
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2022
record_format invenio
spelling cern-28133462023-01-31T09:42:04Zhttp://cds.cern.ch/record/2813346engPappalardo, AlessandroUmuroglu, YamanBlott, MichaelaMitrevski, JovanHawks, BenTran, NhanLoncar, VladimirSummers, SioniBorras, HendrikMuhizi, JulesTrahms, MatthewHsu, Shih-ChiehHauck, ScottDuarte, JavierQONNX: Representing Arbitrary-Precision Quantized Neural Networksstat.MLMathematical Physics and Mathematicscs.PLComputing and Computerscs.ARComputing and Computerscs.LGComputing and ComputersWe present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.arXiv:2206.07527FERMILAB-CONF-22-471-SCDoai:cds.cern.ch:28133462022-06-15
spellingShingle stat.ML
Mathematical Physics and Mathematics
cs.PL
Computing and Computers
cs.AR
Computing and Computers
cs.LG
Computing and Computers
Pappalardo, Alessandro
Umuroglu, Yaman
Blott, Michaela
Mitrevski, Jovan
Hawks, Ben
Tran, Nhan
Loncar, Vladimir
Summers, Sioni
Borras, Hendrik
Muhizi, Jules
Trahms, Matthew
Hsu, Shih-Chieh
Hauck, Scott
Duarte, Javier
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
title QONNX: Representing Arbitrary-Precision Quantized Neural Networks
title_full QONNX: Representing Arbitrary-Precision Quantized Neural Networks
title_fullStr QONNX: Representing Arbitrary-Precision Quantized Neural Networks
title_full_unstemmed QONNX: Representing Arbitrary-Precision Quantized Neural Networks
title_short QONNX: Representing Arbitrary-Precision Quantized Neural Networks
title_sort qonnx: representing arbitrary-precision quantized neural networks
topic stat.ML
Mathematical Physics and Mathematics
cs.PL
Computing and Computers
cs.AR
Computing and Computers
cs.LG
Computing and Computers
url http://cds.cern.ch/record/2813346
work_keys_str_mv AT pappalardoalessandro qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT umurogluyaman qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT blottmichaela qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT mitrevskijovan qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT hawksben qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT trannhan qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT loncarvladimir qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT summerssioni qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT borrashendrik qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT muhizijules qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT trahmsmatthew qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT hsushihchieh qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT hauckscott qonnxrepresentingarbitraryprecisionquantizedneuralnetworks
AT duartejavier qonnxrepresentingarbitraryprecisionquantizedneuralnetworks