Cargando…

Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exp...

Descripción completa

Detalles Bibliográficos
Autores principales: Sestito, Cristian, Spagnolo, Fanny, Perri, Stefania
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538663/
https://www.ncbi.nlm.nih.gov/pubmed/34677296
http://dx.doi.org/10.3390/jimaging7100210
_version_ 1784588560545873920
author Sestito, Cristian
Spagnolo, Fanny
Perri, Stefania
author_facet Sestito, Cristian
Spagnolo, Fanny
Perri, Stefania
author_sort Sestito, Cristian
collection PubMed
description Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy.
format Online
Article
Text
id pubmed-8538663
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85386632021-10-28 Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions Sestito, Cristian Spagnolo, Fanny Perri, Stefania J Imaging Article Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy. MDPI 2021-10-12 /pmc/articles/PMC8538663/ /pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sestito, Cristian
Spagnolo, Fanny
Perri, Stefania
Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_full Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_fullStr Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_full_unstemmed Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_short Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_sort design of flexible hardware accelerators for image convolutions and transposed convolutions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538663/
https://www.ncbi.nlm.nih.gov/pubmed/34677296
http://dx.doi.org/10.3390/jimaging7100210
work_keys_str_mv AT sestitocristian designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions
AT spagnolofanny designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions
AT perristefania designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions