Cargando…
Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/2632-2153/aba042 http://cds.cern.ch/record/2715322 |
_version_ | 1780965425566187520 |
---|---|
author | Loncar, Vladimir Ngadiuba, Jennifer Duarte, Javier Harris, Philip Hoang, Duc Loncar, Vladimir Pedro, Kevin Pierini, Maurizio Rankin, Dylan Sagear, Sheila Summers, Sioni Tran, Nhan Jindariani, Sergo Wu, Zhenbin Liu, Mia Hoang, Duc Di Guglielmo, Giuseppe Kreinar, Edward |
author_facet | Loncar, Vladimir Ngadiuba, Jennifer Duarte, Javier Harris, Philip Hoang, Duc Loncar, Vladimir Pedro, Kevin Pierini, Maurizio Rankin, Dylan Sagear, Sheila Summers, Sioni Tran, Nhan Jindariani, Sergo Wu, Zhenbin Liu, Mia Hoang, Duc Di Guglielmo, Giuseppe Kreinar, Edward |
author_sort | Loncar, Vladimir |
collection | CERN |
description | We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources. |
id | cern-2715322 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2021 |
record_format | invenio |
spelling | cern-27153222022-02-22T13:00:31Zdoi:10.1088/2632-2153/aba042http://cds.cern.ch/record/2715322engLoncar, VladimirNgadiuba, JenniferDuarte, JavierHarris, PhilipHoang, DucLoncar, VladimirPedro, KevinPierini, MaurizioRankin, DylanSagear, SheilaSummers, SioniTran, NhanJindariani, SergoWu, ZhenbinLiu, MiaHoang, DucDi Guglielmo, GiuseppeKreinar, EdwardCompressing deep neural networks on FPGAs to binary and ternary precision with HLS4MLhep-exParticle Physics - Experimenteess.SPcs.LGComputing and ComputersWe present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.arXiv:2003.06308FERMILAB-PUB-20-167-PPD-SCDoai:cds.cern.ch:27153222021 |
spellingShingle | hep-ex Particle Physics - Experiment eess.SP cs.LG Computing and Computers Loncar, Vladimir Ngadiuba, Jennifer Duarte, Javier Harris, Philip Hoang, Duc Loncar, Vladimir Pedro, Kevin Pierini, Maurizio Rankin, Dylan Sagear, Sheila Summers, Sioni Tran, Nhan Jindariani, Sergo Wu, Zhenbin Liu, Mia Hoang, Duc Di Guglielmo, Giuseppe Kreinar, Edward Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML |
title | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML |
title_full | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML |
title_fullStr | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML |
title_full_unstemmed | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML |
title_short | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML |
title_sort | compressing deep neural networks on fpgas to binary and ternary precision with hls4ml |
topic | hep-ex Particle Physics - Experiment eess.SP cs.LG Computing and Computers |
url | https://dx.doi.org/10.1088/2632-2153/aba042 http://cds.cern.ch/record/2715322 |
work_keys_str_mv | AT loncarvladimir compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT ngadiubajennifer compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT duartejavier compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT harrisphilip compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT hoangduc compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT loncarvladimir compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT pedrokevin compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT pierinimaurizio compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT rankindylan compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT sagearsheila compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT summerssioni compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT trannhan compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT jindarianisergo compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT wuzhenbin compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT liumia compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT hoangduc compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT diguglielmogiuseppe compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT kreinaredward compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml |