Cargando…

Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sestito, Cristian, Spagnolo, Fanny, Perri, Stefania
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538663/ https://www.ncbi.nlm.nih.gov/pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210

_version_	1784588560545873920
author	Sestito, Cristian Spagnolo, Fanny Perri, Stefania
author_facet	Sestito, Cristian Spagnolo, Fanny Perri, Stefania
author_sort	Sestito, Cristian
collection	PubMed
description	Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy.
format	Online Article Text
id	pubmed-8538663
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-85386632021-10-28 Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions Sestito, Cristian Spagnolo, Fanny Perri, Stefania J Imaging Article Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy. MDPI 2021-10-12 /pmc/articles/PMC8538663/ /pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Sestito, Cristian Spagnolo, Fanny Perri, Stefania Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title	Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_full	Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_fullStr	Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_full_unstemmed	Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_short	Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
title_sort	design of flexible hardware accelerators for image convolutions and transposed convolutions
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538663/ https://www.ncbi.nlm.nih.gov/pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210
work_keys_str_mv	AT sestitocristian designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions AT spagnolofanny designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions AT perristefania designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions

Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Ejemplares similares