Cargando…

Fast convolutional neural networks on FPGAs with hls4ml

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond la...

Descripción completa

Detalles Bibliográficos
Autores principales: Aarrestad, Thea, Loncar, Vladimir, Ghielmetti, Nicolò, Pierini, Maurizio, Summers, Sioni, Ngadiuba, Jennifer, Petersson, Christoffer, Linander, Hampus, Iiyama, Yutaro, Di Guglielmo, Giuseppe, Duarte, Javier, Harris, Philip, Rankin, Dylan, Jindariani, Sergo, Pedro, Kevin, Tran, Nhan, Liu, Mia, Kreinar, Edward, Wu, Zhenbin, Hoang, Duc
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:https://dx.doi.org/10.1088/2632-2153/ac0ea1
http://cds.cern.ch/record/2751704
_version_ 1780969199742484480
author Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolò
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Tran, Nhan
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
author_facet Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolò
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Tran, Nhan
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
author_sort Aarrestad, Thea
collection CERN
description We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.
id cern-2751704
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27517042023-01-31T10:18:02Zdoi:10.1088/2632-2153/ac0ea1http://cds.cern.ch/record/2751704engAarrestad, TheaLoncar, VladimirGhielmetti, NicolòPierini, MaurizioSummers, SioniNgadiuba, JenniferPetersson, ChristofferLinander, HampusIiyama, YutaroDi Guglielmo, GiuseppeDuarte, JavierHarris, PhilipRankin, DylanJindariani, SergoPedro, KevinTran, NhanLiu, MiaKreinar, EdwardWu, ZhenbinHoang, DucFast convolutional neural networks on FPGAs with hls4mlstat.MLMathematical Physics and Mathematicsphysics.ins-detDetectors and Experimental Techniqueshep-exParticle Physics - Experimentcs.CVComputing and Computerscs.LGComputing and ComputersWe introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,\mu$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.arXiv:2101.05108FERMILAB-PUB-21-130-SCDoai:cds.cern.ch:27517042021-01-13
spellingShingle stat.ML
Mathematical Physics and Mathematics
physics.ins-det
Detectors and Experimental Techniques
hep-ex
Particle Physics - Experiment
cs.CV
Computing and Computers
cs.LG
Computing and Computers
Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolò
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Tran, Nhan
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
Fast convolutional neural networks on FPGAs with hls4ml
title Fast convolutional neural networks on FPGAs with hls4ml
title_full Fast convolutional neural networks on FPGAs with hls4ml
title_fullStr Fast convolutional neural networks on FPGAs with hls4ml
title_full_unstemmed Fast convolutional neural networks on FPGAs with hls4ml
title_short Fast convolutional neural networks on FPGAs with hls4ml
title_sort fast convolutional neural networks on fpgas with hls4ml
topic stat.ML
Mathematical Physics and Mathematics
physics.ins-det
Detectors and Experimental Techniques
hep-ex
Particle Physics - Experiment
cs.CV
Computing and Computers
cs.LG
Computing and Computers
url https://dx.doi.org/10.1088/2632-2153/ac0ea1
http://cds.cern.ch/record/2751704
work_keys_str_mv AT aarrestadthea fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT loncarvladimir fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT ghielmettinicolo fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT pierinimaurizio fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT summerssioni fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT ngadiubajennifer fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT peterssonchristoffer fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT linanderhampus fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT iiyamayutaro fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT diguglielmogiuseppe fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT duartejavier fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT harrisphilip fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT rankindylan fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT jindarianisergo fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT pedrokevin fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT trannhan fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT liumia fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT kreinaredward fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT wuzhenbin fastconvolutionalneuralnetworksonfpgaswithhls4ml
AT hoangduc fastconvolutionalneuralnetworksonfpgaswithhls4ml