Cargando…

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Perri, Stefania, Sestito, Cristian, Spagnolo, Fanny, Corsonello, Pasquale
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321086/ https://www.ncbi.nlm.nih.gov/pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085

_version_	1783730768042786816
author	Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale
author_facet	Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale
author_sort	Perri, Stefania
collection	PubMed
description	Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500 mW@200 MHz and occupies ~5.6%, ~4.1%, ~17%, and ~96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8 W@150 MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to ~20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using ~5.7× fewer on-chip memory resources.
format	Online Article Text
id	pubmed-8321086
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-83210862021-08-26 Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale J Imaging Article Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500 mW@200 MHz and occupies ~5.6%, ~4.1%, ~17%, and ~96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8 W@150 MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to ~20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using ~5.7× fewer on-chip memory resources. MDPI 2020-08-25 /pmc/articles/PMC8321086/ /pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085 Text en © 2020 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title	Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_full	Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_fullStr	Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_full_unstemmed	Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_short	Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
title_sort	efficient deconvolution architecture for heterogeneous systems-on-chip
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321086/ https://www.ncbi.nlm.nih.gov/pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085
work_keys_str_mv	AT perristefania efficientdeconvolutionarchitectureforheterogeneoussystemsonchip AT sestitocristian efficientdeconvolutionarchitectureforheterogeneoussystemsonchip AT spagnolofanny efficientdeconvolutionarchitectureforheterogeneoussystemsonchip AT corsonellopasquale efficientdeconvolutionarchitectureforheterogeneoussystemsonchip

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Ejemplares similares