Cargando…
Fast convolutional neural networks on FPGAs with hls4ml
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond la...
Autores principales: | , , , , , , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/2632-2153/ac0ea1 http://cds.cern.ch/record/2751704 |
_version_ | 1780969199742484480 |
---|---|
author | Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc |
author_facet | Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc |
author_sort | Aarrestad, Thea |
collection | CERN |
description | We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation. |
id | cern-2751704 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2021 |
record_format | invenio |
spelling | cern-27517042023-01-31T10:18:02Zdoi:10.1088/2632-2153/ac0ea1http://cds.cern.ch/record/2751704engAarrestad, TheaLoncar, VladimirGhielmetti, NicolòPierini, MaurizioSummers, SioniNgadiuba, JenniferPetersson, ChristofferLinander, HampusIiyama, YutaroDi Guglielmo, GiuseppeDuarte, JavierHarris, PhilipRankin, DylanJindariani, SergoPedro, KevinTran, NhanLiu, MiaKreinar, EdwardWu, ZhenbinHoang, DucFast convolutional neural networks on FPGAs with hls4mlstat.MLMathematical Physics and Mathematicsphysics.ins-detDetectors and Experimental Techniqueshep-exParticle Physics - Experimentcs.CVComputing and Computerscs.LGComputing and ComputersWe introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,\mu$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.arXiv:2101.05108FERMILAB-PUB-21-130-SCDoai:cds.cern.ch:27517042021-01-13 |
spellingShingle | stat.ML Mathematical Physics and Mathematics physics.ins-det Detectors and Experimental Techniques hep-ex Particle Physics - Experiment cs.CV Computing and Computers cs.LG Computing and Computers Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc Fast convolutional neural networks on FPGAs with hls4ml |
title | Fast convolutional neural networks on FPGAs with hls4ml |
title_full | Fast convolutional neural networks on FPGAs with hls4ml |
title_fullStr | Fast convolutional neural networks on FPGAs with hls4ml |
title_full_unstemmed | Fast convolutional neural networks on FPGAs with hls4ml |
title_short | Fast convolutional neural networks on FPGAs with hls4ml |
title_sort | fast convolutional neural networks on fpgas with hls4ml |
topic | stat.ML Mathematical Physics and Mathematics physics.ins-det Detectors and Experimental Techniques hep-ex Particle Physics - Experiment cs.CV Computing and Computers cs.LG Computing and Computers |
url | https://dx.doi.org/10.1088/2632-2153/ac0ea1 http://cds.cern.ch/record/2751704 |
work_keys_str_mv | AT aarrestadthea fastconvolutionalneuralnetworksonfpgaswithhls4ml AT loncarvladimir fastconvolutionalneuralnetworksonfpgaswithhls4ml AT ghielmettinicolo fastconvolutionalneuralnetworksonfpgaswithhls4ml AT pierinimaurizio fastconvolutionalneuralnetworksonfpgaswithhls4ml AT summerssioni fastconvolutionalneuralnetworksonfpgaswithhls4ml AT ngadiubajennifer fastconvolutionalneuralnetworksonfpgaswithhls4ml AT peterssonchristoffer fastconvolutionalneuralnetworksonfpgaswithhls4ml AT linanderhampus fastconvolutionalneuralnetworksonfpgaswithhls4ml AT iiyamayutaro fastconvolutionalneuralnetworksonfpgaswithhls4ml AT diguglielmogiuseppe fastconvolutionalneuralnetworksonfpgaswithhls4ml AT duartejavier fastconvolutionalneuralnetworksonfpgaswithhls4ml AT harrisphilip fastconvolutionalneuralnetworksonfpgaswithhls4ml AT rankindylan fastconvolutionalneuralnetworksonfpgaswithhls4ml AT jindarianisergo fastconvolutionalneuralnetworksonfpgaswithhls4ml AT pedrokevin fastconvolutionalneuralnetworksonfpgaswithhls4ml AT trannhan fastconvolutionalneuralnetworksonfpgaswithhls4ml AT liumia fastconvolutionalneuralnetworksonfpgaswithhls4ml AT kreinaredward fastconvolutionalneuralnetworksonfpgaswithhls4ml AT wuzhenbin fastconvolutionalneuralnetworksonfpgaswithhls4ml AT hoangduc fastconvolutionalneuralnetworksonfpgaswithhls4ml |