Cargando…

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to...

Descripción completa

Detalles Bibliográficos
Autores principales: Loncar, Vladimir, Ngadiuba, Jennifer, Duarte, Javier, Harris, Philip, Hoang, Duc, Pedro, Kevin, Pierini, Maurizio, Rankin, Dylan, Sagear, Sheila, Summers, Sioni, Tran, Nhan, Jindariani, Sergo, Wu, Zhenbin, Liu, Mia, Di Guglielmo, Giuseppe, Kreinar, Edward
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:https://dx.doi.org/10.1088/2632-2153/aba042
http://cds.cern.ch/record/2715322
_version_ 1780965425566187520
author Loncar, Vladimir
Ngadiuba, Jennifer
Duarte, Javier
Harris, Philip
Hoang, Duc
Loncar, Vladimir
Pedro, Kevin
Pierini, Maurizio
Rankin, Dylan
Sagear, Sheila
Summers, Sioni
Tran, Nhan
Jindariani, Sergo
Wu, Zhenbin
Liu, Mia
Hoang, Duc
Di Guglielmo, Giuseppe
Kreinar, Edward
author_facet Loncar, Vladimir
Ngadiuba, Jennifer
Duarte, Javier
Harris, Philip
Hoang, Duc
Loncar, Vladimir
Pedro, Kevin
Pierini, Maurizio
Rankin, Dylan
Sagear, Sheila
Summers, Sioni
Tran, Nhan
Jindariani, Sergo
Wu, Zhenbin
Liu, Mia
Hoang, Duc
Di Guglielmo, Giuseppe
Kreinar, Edward
author_sort Loncar, Vladimir
collection CERN
description We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
id cern-2715322
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27153222022-02-22T13:00:31Zdoi:10.1088/2632-2153/aba042http://cds.cern.ch/record/2715322engLoncar, VladimirNgadiuba, JenniferDuarte, JavierHarris, PhilipHoang, DucLoncar, VladimirPedro, KevinPierini, MaurizioRankin, DylanSagear, SheilaSummers, SioniTran, NhanJindariani, SergoWu, ZhenbinLiu, MiaHoang, DucDi Guglielmo, GiuseppeKreinar, EdwardCompressing deep neural networks on FPGAs to binary and ternary precision with HLS4MLhep-exParticle Physics - Experimenteess.SPcs.LGComputing and ComputersWe present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.arXiv:2003.06308FERMILAB-PUB-20-167-PPD-SCDoai:cds.cern.ch:27153222021
spellingShingle hep-ex
Particle Physics - Experiment
eess.SP
cs.LG
Computing and Computers
Loncar, Vladimir
Ngadiuba, Jennifer
Duarte, Javier
Harris, Philip
Hoang, Duc
Loncar, Vladimir
Pedro, Kevin
Pierini, Maurizio
Rankin, Dylan
Sagear, Sheila
Summers, Sioni
Tran, Nhan
Jindariani, Sergo
Wu, Zhenbin
Liu, Mia
Hoang, Duc
Di Guglielmo, Giuseppe
Kreinar, Edward
Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_full Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_fullStr Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_full_unstemmed Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_short Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_sort compressing deep neural networks on fpgas to binary and ternary precision with hls4ml
topic hep-ex
Particle Physics - Experiment
eess.SP
cs.LG
Computing and Computers
url https://dx.doi.org/10.1088/2632-2153/aba042
http://cds.cern.ch/record/2715322
work_keys_str_mv AT loncarvladimir compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT ngadiubajennifer compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT duartejavier compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT harrisphilip compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT hoangduc compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT loncarvladimir compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT pedrokevin compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT pierinimaurizio compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT rankindylan compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT sagearsheila compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT summerssioni compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT trannhan compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT jindarianisergo compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT wuzhenbin compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT liumia compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT hoangduc compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT diguglielmogiuseppe compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml
AT kreinaredward compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml