Cargando…
AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning
Convolutional neural networks (CNNs) play a crucial role in many EdgeAI and TinyML applications, but their implementation usually requires external memory, which degrades the feasibility of such resource-hungry environments. To solve this problem, this paper proposes memory-reduction methods at the...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575357/ https://www.ncbi.nlm.nih.gov/pubmed/37836934 http://dx.doi.org/10.3390/s23198104 |
_version_ | 1785120903308247040 |
---|---|
author | Kang, Hyeong-Ju Yang, Byung-Do |
author_facet | Kang, Hyeong-Ju Yang, Byung-Do |
author_sort | Kang, Hyeong-Ju |
collection | PubMed |
description | Convolutional neural networks (CNNs) play a crucial role in many EdgeAI and TinyML applications, but their implementation usually requires external memory, which degrades the feasibility of such resource-hungry environments. To solve this problem, this paper proposes memory-reduction methods at the algorithm and architecture level, implementing a reasonable-performance CNN with the on-chip memory of a practical device. At the algorithm level, accelerator-aware pruning is adopted to reduce the weight memory amount. For activation memory reduction, a stream-based line-buffer architecture is proposed. In the proposed architecture, each layer is implemented by a dedicated block, and the layer blocks operate in a pipelined way. Each block has a line buffer to store a few rows of input data instead of a frame buffer to store the whole feature map, reducing intermediate data-storage size. The experimental results show that the object-detection CNNs of MobileNetV1/V2 and an SSDLite variant, widely used in TinyML applications, can be implemented even on a low-end FPGA without external memory. |
format | Online Article Text |
id | pubmed-10575357 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-105753572023-10-14 AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning Kang, Hyeong-Ju Yang, Byung-Do Sensors (Basel) Article Convolutional neural networks (CNNs) play a crucial role in many EdgeAI and TinyML applications, but their implementation usually requires external memory, which degrades the feasibility of such resource-hungry environments. To solve this problem, this paper proposes memory-reduction methods at the algorithm and architecture level, implementing a reasonable-performance CNN with the on-chip memory of a practical device. At the algorithm level, accelerator-aware pruning is adopted to reduce the weight memory amount. For activation memory reduction, a stream-based line-buffer architecture is proposed. In the proposed architecture, each layer is implemented by a dedicated block, and the layer blocks operate in a pipelined way. Each block has a line buffer to store a few rows of input data instead of a frame buffer to store the whole feature map, reducing intermediate data-storage size. The experimental results show that the object-detection CNNs of MobileNetV1/V2 and an SSDLite variant, widely used in TinyML applications, can be implemented even on a low-end FPGA without external memory. MDPI 2023-09-27 /pmc/articles/PMC10575357/ /pubmed/37836934 http://dx.doi.org/10.3390/s23198104 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kang, Hyeong-Ju Yang, Byung-Do AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning |
title | AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning |
title_full | AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning |
title_fullStr | AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning |
title_full_unstemmed | AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning |
title_short | AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning |
title_sort | aocstream: all-on-chip cnn accelerator with stream-based line-buffer architecture and accelerator-aware pruning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575357/ https://www.ncbi.nlm.nih.gov/pubmed/37836934 http://dx.doi.org/10.3390/s23198104 |
work_keys_str_mv | AT kanghyeongju aocstreamallonchipcnnacceleratorwithstreambasedlinebufferarchitectureandacceleratorawarepruning AT yangbyungdo aocstreamallonchipcnnacceleratorwithstreambasedlinebufferarchitectureandacceleratorawarepruning |