Cargando…

Fast convolutional neural networks on FPGAs with hls4ml

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aarrestad, Thea, Loncar, Vladimir, Ghielmetti, Nicolò, Pierini, Maurizio, Summers, Sioni, Ngadiuba, Jennifer, Petersson, Christoffer, Linander, Hampus, Iiyama, Yutaro, Di Guglielmo, Giuseppe, Duarte, Javier, Harris, Philip, Rankin, Dylan, Jindariani, Sergo, Pedro, Kevin, Tran, Nhan, Liu, Mia, Kreinar, Edward, Wu, Zhenbin, Hoang, Duc
Lenguaje:	eng
Publicado:	2021
Materias:	stat.ML Mathematical Physics and Mathematics physics.ins-det Detectors and Experimental Techniques hep-ex Particle Physics - Experiment cs.CV Computing and Computers cs.LG
Acceso en línea:	https://dx.doi.org/10.1088/2632-2153/ac0ea1 http://cds.cern.ch/record/2751704

_version_	1780969199742484480
author	Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc
author_facet	Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc
author_sort	Aarrestad, Thea
collection	CERN
description	We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.
id	cern-2751704
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2021
record_format	invenio
spelling	cern-27517042023-01-31T10:18:02Zdoi:10.1088/2632-2153/ac0ea1http://cds.cern.ch/record/2751704engAarrestad, TheaLoncar, VladimirGhielmetti, NicolòPierini, MaurizioSummers, SioniNgadiuba, JenniferPetersson, ChristofferLinander, HampusIiyama, YutaroDi Guglielmo, GiuseppeDuarte, JavierHarris, PhilipRankin, DylanJindariani, SergoPedro, KevinTran, NhanLiu, MiaKreinar, EdwardWu, ZhenbinHoang, DucFast convolutional neural networks on FPGAs with hls4mlstat.MLMathematical Physics and Mathematicsphysics.ins-detDetectors and Experimental Techniqueshep-exParticle Physics - Experimentcs.CVComputing and Computerscs.LGComputing and ComputersWe introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,\mu$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.arXiv:2101.05108FERMILAB-PUB-21-130-SCDoai:cds.cern.ch:27517042021-01-13
spellingShingle	stat.ML Mathematical Physics and Mathematics physics.ins-det Detectors and Experimental Techniques hep-ex Particle Physics - Experiment cs.CV Computing and Computers cs.LG Computing and Computers Aarrestad, Thea Loncar, Vladimir Ghielmetti, Nicolò Pierini, Maurizio Summers, Sioni Ngadiuba, Jennifer Petersson, Christoffer Linander, Hampus Iiyama, Yutaro Di Guglielmo, Giuseppe Duarte, Javier Harris, Philip Rankin, Dylan Jindariani, Sergo Pedro, Kevin Tran, Nhan Liu, Mia Kreinar, Edward Wu, Zhenbin Hoang, Duc Fast convolutional neural networks on FPGAs with hls4ml
title	Fast convolutional neural networks on FPGAs with hls4ml
title_full	Fast convolutional neural networks on FPGAs with hls4ml
title_fullStr	Fast convolutional neural networks on FPGAs with hls4ml
title_full_unstemmed	Fast convolutional neural networks on FPGAs with hls4ml
title_short	Fast convolutional neural networks on FPGAs with hls4ml
title_sort	fast convolutional neural networks on fpgas with hls4ml
topic	stat.ML Mathematical Physics and Mathematics physics.ins-det Detectors and Experimental Techniques hep-ex Particle Physics - Experiment cs.CV Computing and Computers cs.LG Computing and Computers
url	https://dx.doi.org/10.1088/2632-2153/ac0ea1 http://cds.cern.ch/record/2751704
work_keys_str_mv	AT aarrestadthea fastconvolutionalneuralnetworksonfpgaswithhls4ml AT loncarvladimir fastconvolutionalneuralnetworksonfpgaswithhls4ml AT ghielmettinicolo fastconvolutionalneuralnetworksonfpgaswithhls4ml AT pierinimaurizio fastconvolutionalneuralnetworksonfpgaswithhls4ml AT summerssioni fastconvolutionalneuralnetworksonfpgaswithhls4ml AT ngadiubajennifer fastconvolutionalneuralnetworksonfpgaswithhls4ml AT peterssonchristoffer fastconvolutionalneuralnetworksonfpgaswithhls4ml AT linanderhampus fastconvolutionalneuralnetworksonfpgaswithhls4ml AT iiyamayutaro fastconvolutionalneuralnetworksonfpgaswithhls4ml AT diguglielmogiuseppe fastconvolutionalneuralnetworksonfpgaswithhls4ml AT duartejavier fastconvolutionalneuralnetworksonfpgaswithhls4ml AT harrisphilip fastconvolutionalneuralnetworksonfpgaswithhls4ml AT rankindylan fastconvolutionalneuralnetworksonfpgaswithhls4ml AT jindarianisergo fastconvolutionalneuralnetworksonfpgaswithhls4ml AT pedrokevin fastconvolutionalneuralnetworksonfpgaswithhls4ml AT trannhan fastconvolutionalneuralnetworksonfpgaswithhls4ml AT liumia fastconvolutionalneuralnetworksonfpgaswithhls4ml AT kreinaredward fastconvolutionalneuralnetworksonfpgaswithhls4ml AT wuzhenbin fastconvolutionalneuralnetworksonfpgaswithhls4ml AT hoangduc fastconvolutionalneuralnetworksonfpgaswithhls4ml

Fast convolutional neural networks on FPGAs with hls4ml

Ejemplares similares