Cargando…
Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions
Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exp...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538663/ https://www.ncbi.nlm.nih.gov/pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210 |
_version_ | 1784588560545873920 |
---|---|
author | Sestito, Cristian Spagnolo, Fanny Perri, Stefania |
author_facet | Sestito, Cristian Spagnolo, Fanny Perri, Stefania |
author_sort | Sestito, Cristian |
collection | PubMed |
description | Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy. |
format | Online Article Text |
id | pubmed-8538663 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-85386632021-10-28 Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions Sestito, Cristian Spagnolo, Fanny Perri, Stefania J Imaging Article Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy. MDPI 2021-10-12 /pmc/articles/PMC8538663/ /pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Sestito, Cristian Spagnolo, Fanny Perri, Stefania Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions |
title | Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions |
title_full | Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions |
title_fullStr | Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions |
title_full_unstemmed | Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions |
title_short | Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions |
title_sort | design of flexible hardware accelerators for image convolutions and transposed convolutions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538663/ https://www.ncbi.nlm.nih.gov/pubmed/34677296 http://dx.doi.org/10.3390/jimaging7100210 |
work_keys_str_mv | AT sestitocristian designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions AT spagnolofanny designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions AT perristefania designofflexiblehardwareacceleratorsforimageconvolutionsandtransposedconvolutions |