Cargando…

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfo...

Descripción completa

Detalles Bibliográficos
Autores principales: Perri, Stefania, Sestito, Cristian, Spagnolo, Fanny, Corsonello, Pasquale
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321086/
https://www.ncbi.nlm.nih.gov/pubmed/34460742
http://dx.doi.org/10.3390/jimaging6090085
_version_ 1783730768042786816
author Perri, Stefania
Sestito, Cristian
Spagnolo, Fanny
Corsonello, Pasquale
author_facet Perri, Stefania
Sestito, Cristian
Spagnolo, Fanny
Corsonello, Pasquale
author_sort Perri, Stefania
collection PubMed
description Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500 mW@200 MHz and occupies ~5.6%, ~4.1%, ~17%, and ~96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8 W@150 MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to ~20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using ~5.7× fewer on-chip memory resources.
format Online
Article
Text
id pubmed-8321086
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83210862021-08-26 Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale J Imaging Article Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500 mW@200 MHz and occupies ~5.6%, ~4.1%, ~17%, and ~96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8 W@150 MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to ~20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using ~5.7× fewer on-chip memory resources. MDPI 2020-08-25 /pmc/articles/PMC8321086/ /pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085 Text en © 2020 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Perri, Stefania
Sestito, Cristian
Spagnolo, Fanny
Corsonello, Pasquale
Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_full Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_fullStr Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_full_unstemmed Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_short Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_sort efficient deconvolution architecture for heterogeneous systems-on-chip
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321086/
https://www.ncbi.nlm.nih.gov/pubmed/34460742
http://dx.doi.org/10.3390/jimaging6090085
work_keys_str_mv AT perristefania efficientdeconvolutionarchitectureforheterogeneoussystemsonchip
AT sestitocristian efficientdeconvolutionarchitectureforheterogeneoussystemsonchip
AT spagnolofanny efficientdeconvolutionarchitectureforheterogeneoussystemsonchip
AT corsonellopasquale efficientdeconvolutionarchitectureforheterogeneoussystemsonchip