Cargando…

An OpenCL-Based FPGA Accelerator for Faster R-CNN

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still...

Descripción completa

Detalles Bibliográficos
Autores principales:	An, Jianjing, Zhang, Dezheng, Xu, Ke, Wang, Dong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9600897/ https://www.ncbi.nlm.nih.gov/pubmed/37420365 http://dx.doi.org/10.3390/e24101346

_version_	1784816943826468864
author	An, Jianjing Zhang, Dezheng Xu, Ke Wang, Dong
author_facet	An, Jianjing Zhang, Dezheng Xu, Ke Wang, Dong
author_sort	An, Jianjing
collection	PubMed
description	In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves [Formula: see text] and [Formula: see text] inference throughput improvements, respectively.
format	Online Article Text
id	pubmed-9600897
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96008972022-10-27 An OpenCL-Based FPGA Accelerator for Faster R-CNN An, Jianjing Zhang, Dezheng Xu, Ke Wang, Dong Entropy (Basel) Article In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves [Formula: see text] and [Formula: see text] inference throughput improvements, respectively. MDPI 2022-09-23 /pmc/articles/PMC9600897/ /pubmed/37420365 http://dx.doi.org/10.3390/e24101346 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article An, Jianjing Zhang, Dezheng Xu, Ke Wang, Dong An OpenCL-Based FPGA Accelerator for Faster R-CNN
title	An OpenCL-Based FPGA Accelerator for Faster R-CNN
title_full	An OpenCL-Based FPGA Accelerator for Faster R-CNN
title_fullStr	An OpenCL-Based FPGA Accelerator for Faster R-CNN
title_full_unstemmed	An OpenCL-Based FPGA Accelerator for Faster R-CNN
title_short	An OpenCL-Based FPGA Accelerator for Faster R-CNN
title_sort	opencl-based fpga accelerator for faster r-cnn
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9600897/ https://www.ncbi.nlm.nih.gov/pubmed/37420365 http://dx.doi.org/10.3390/e24101346
work_keys_str_mv	AT anjianjing anopenclbasedfpgaacceleratorforfasterrcnn AT zhangdezheng anopenclbasedfpgaacceleratorforfasterrcnn AT xuke anopenclbasedfpgaacceleratorforfasterrcnn AT wangdong anopenclbasedfpgaacceleratorforfasterrcnn AT anjianjing openclbasedfpgaacceleratorforfasterrcnn AT zhangdezheng openclbasedfpgaacceleratorforfasterrcnn AT xuke openclbasedfpgaacceleratorforfasterrcnn AT wangdong openclbasedfpgaacceleratorforfasterrcnn

An OpenCL-Based FPGA Accelerator for Faster R-CNN

Ejemplares similares