Cargando…

An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix trans...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Yunping, Lu, Jianzhuang, Chen, Xiaowen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7583864/ https://www.ncbi.nlm.nih.gov/pubmed/32998366 http://dx.doi.org/10.3390/s20195558

_version_	1783599474731384832
author	Zhao, Yunping Lu, Jianzhuang Chen, Xiaowen
author_facet	Zhao, Yunping Lu, Jianzhuang Chen, Xiaowen
author_sort	Zhao, Yunping
collection	PubMed
description	Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2×–4.0× faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator.
format	Online Article Text
id	pubmed-7583864
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75838642020-10-29 An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs Zhao, Yunping Lu, Jianzhuang Chen, Xiaowen Sensors (Basel) Article Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2×–4.0× faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator. MDPI 2020-09-28 /pmc/articles/PMC7583864/ /pubmed/32998366 http://dx.doi.org/10.3390/s20195558 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhao, Yunping Lu, Jianzhuang Chen, Xiaowen An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs
title	An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs
title_full	An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs
title_fullStr	An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs
title_full_unstemmed	An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs
title_short	An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs
title_sort	accelerator design using a mtca decomposition algorithm for cnns
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7583864/ https://www.ncbi.nlm.nih.gov/pubmed/32998366 http://dx.doi.org/10.3390/s20195558
work_keys_str_mv	AT zhaoyunping anacceleratordesignusingamtcadecompositionalgorithmforcnns AT lujianzhuang anacceleratordesignusingamtcadecompositionalgorithmforcnns AT chenxiaowen anacceleratordesignusingamtcadecompositionalgorithmforcnns AT zhaoyunping acceleratordesignusingamtcadecompositionalgorithmforcnns AT lujianzhuang acceleratordesignusingamtcadecompositionalgorithmforcnns AT chenxiaowen acceleratordesignusingamtcadecompositionalgorithmforcnns

An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

Ejemplares similares