Cargando…

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Loncar, Vladimir, Ngadiuba, Jennifer, Duarte, Javier, Harris, Philip, Hoang, Duc, Pedro, Kevin, Pierini, Maurizio, Rankin, Dylan, Sagear, Sheila, Summers, Sioni, Tran, Nhan, Jindariani, Sergo, Wu, Zhenbin, Liu, Mia, Di Guglielmo, Giuseppe, Kreinar, Edward
Lenguaje:	eng
Publicado:	2021
Materias:	hep-ex Particle Physics - Experiment eess.SP cs.LG Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1088/2632-2153/aba042 http://cds.cern.ch/record/2715322

_version_	1780965425566187520
author	Loncar, Vladimir Ngadiuba, Jennifer Duarte, Javier Harris, Philip Hoang, Duc Loncar, Vladimir Pedro, Kevin Pierini, Maurizio Rankin, Dylan Sagear, Sheila Summers, Sioni Tran, Nhan Jindariani, Sergo Wu, Zhenbin Liu, Mia Hoang, Duc Di Guglielmo, Giuseppe Kreinar, Edward
author_facet	Loncar, Vladimir Ngadiuba, Jennifer Duarte, Javier Harris, Philip Hoang, Duc Loncar, Vladimir Pedro, Kevin Pierini, Maurizio Rankin, Dylan Sagear, Sheila Summers, Sioni Tran, Nhan Jindariani, Sergo Wu, Zhenbin Liu, Mia Hoang, Duc Di Guglielmo, Giuseppe Kreinar, Edward
author_sort	Loncar, Vladimir
collection	CERN
description	We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
id	cern-2715322
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2021
record_format	invenio
spelling	cern-27153222022-02-22T13:00:31Zdoi:10.1088/2632-2153/aba042http://cds.cern.ch/record/2715322engLoncar, VladimirNgadiuba, JenniferDuarte, JavierHarris, PhilipHoang, DucLoncar, VladimirPedro, KevinPierini, MaurizioRankin, DylanSagear, SheilaSummers, SioniTran, NhanJindariani, SergoWu, ZhenbinLiu, MiaHoang, DucDi Guglielmo, GiuseppeKreinar, EdwardCompressing deep neural networks on FPGAs to binary and ternary precision with HLS4MLhep-exParticle Physics - Experimenteess.SPcs.LGComputing and ComputersWe present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.arXiv:2003.06308FERMILAB-PUB-20-167-PPD-SCDoai:cds.cern.ch:27153222021
spellingShingle	hep-ex Particle Physics - Experiment eess.SP cs.LG Computing and Computers Loncar, Vladimir Ngadiuba, Jennifer Duarte, Javier Harris, Philip Hoang, Duc Loncar, Vladimir Pedro, Kevin Pierini, Maurizio Rankin, Dylan Sagear, Sheila Summers, Sioni Tran, Nhan Jindariani, Sergo Wu, Zhenbin Liu, Mia Hoang, Duc Di Guglielmo, Giuseppe Kreinar, Edward Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title	Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_full	Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_fullStr	Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_full_unstemmed	Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_short	Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML
title_sort	compressing deep neural networks on fpgas to binary and ternary precision with hls4ml
topic	hep-ex Particle Physics - Experiment eess.SP cs.LG Computing and Computers
url	https://dx.doi.org/10.1088/2632-2153/aba042 http://cds.cern.ch/record/2715322
work_keys_str_mv	AT loncarvladimir compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT ngadiubajennifer compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT duartejavier compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT harrisphilip compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT hoangduc compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT loncarvladimir compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT pedrokevin compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT pierinimaurizio compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT rankindylan compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT sagearsheila compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT summerssioni compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT trannhan compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT jindarianisergo compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT wuzhenbin compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT liumia compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT hoangduc compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT diguglielmogiuseppe compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml AT kreinaredward compressingdeepneuralnetworksonfpgastobinaryandternaryprecisionwithhls4ml

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

Ejemplares similares