Cargando…
Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfo...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321086/ https://www.ncbi.nlm.nih.gov/pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085 |
_version_ | 1783730768042786816 |
---|---|
author | Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale |
author_facet | Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale |
author_sort | Perri, Stefania |
collection | PubMed |
description | Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500 mW@200 MHz and occupies ~5.6%, ~4.1%, ~17%, and ~96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8 W@150 MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to ~20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using ~5.7× fewer on-chip memory resources. |
format | Online Article Text |
id | pubmed-8321086 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-83210862021-08-26 Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale J Imaging Article Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500 mW@200 MHz and occupies ~5.6%, ~4.1%, ~17%, and ~96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8 W@150 MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to ~20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using ~5.7× fewer on-chip memory resources. MDPI 2020-08-25 /pmc/articles/PMC8321086/ /pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085 Text en © 2020 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ). |
spellingShingle | Article Perri, Stefania Sestito, Cristian Spagnolo, Fanny Corsonello, Pasquale Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip |
title | Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip |
title_full | Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip |
title_fullStr | Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip |
title_full_unstemmed | Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip |
title_short | Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip |
title_sort | efficient deconvolution architecture for heterogeneous systems-on-chip |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321086/ https://www.ncbi.nlm.nih.gov/pubmed/34460742 http://dx.doi.org/10.3390/jimaging6090085 |
work_keys_str_mv | AT perristefania efficientdeconvolutionarchitectureforheterogeneoussystemsonchip AT sestitocristian efficientdeconvolutionarchitectureforheterogeneoussystemsonchip AT spagnolofanny efficientdeconvolutionarchitectureforheterogeneoussystemsonchip AT corsonellopasquale efficientdeconvolutionarchitectureforheterogeneoussystemsonchip |