Cargando…
A Low-Power Hardware Architecture for Real-Time CNN Computing
Convolutional neural network (CNN) is widely deployed on edge devices, performing tasks such as objective detection, image recognition and acoustic recognition. However, the limited resources and strict power constraints of edge devices pose a great challenge to applying the computationally intensiv...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9965634/ https://www.ncbi.nlm.nih.gov/pubmed/36850642 http://dx.doi.org/10.3390/s23042045 |
_version_ | 1784896813966295040 |
---|---|
author | Liu, Xinyu Cao, Chenhong Duan, Shengyu |
author_facet | Liu, Xinyu Cao, Chenhong Duan, Shengyu |
author_sort | Liu, Xinyu |
collection | PubMed |
description | Convolutional neural network (CNN) is widely deployed on edge devices, performing tasks such as objective detection, image recognition and acoustic recognition. However, the limited resources and strict power constraints of edge devices pose a great challenge to applying the computationally intensive CNN models. In addition, for the edge applications with real-time requirements, such as real-time computing (RTC) systems, the computations need to be completed considering the required timing constraint, so it is more difficult to trade off between computational latency and power consumption. In this paper, we propose a low-power CNN accelerator for edge inference of RTC systems, where the computations are operated in a column-wise manner, to realize an immediate computation for the currently available input data. We observe that most computations of some CNN kernels in deep layers can be completed in multiple cycles, while not affecting the overall computational latency. Thus, we present a multi-cycle scheme to conduct the column-wise convolutional operations to reduce the hardware resource and power consumption. We present hardware architecture for the multi-cycle scheme as a domain-specific CNN architecture, which is then implemented in a 65 nm technology. We prove our proposed approach realizes up to 8.45%, 49.41% and 50.64% power reductions for LeNet, AlexNet and VGG16, respectively. The experimental results show that our approach tends to cause a larger power reduction for the CNN models with greater depth, larger kernels and more channels. |
format | Online Article Text |
id | pubmed-9965634 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99656342023-02-26 A Low-Power Hardware Architecture for Real-Time CNN Computing Liu, Xinyu Cao, Chenhong Duan, Shengyu Sensors (Basel) Article Convolutional neural network (CNN) is widely deployed on edge devices, performing tasks such as objective detection, image recognition and acoustic recognition. However, the limited resources and strict power constraints of edge devices pose a great challenge to applying the computationally intensive CNN models. In addition, for the edge applications with real-time requirements, such as real-time computing (RTC) systems, the computations need to be completed considering the required timing constraint, so it is more difficult to trade off between computational latency and power consumption. In this paper, we propose a low-power CNN accelerator for edge inference of RTC systems, where the computations are operated in a column-wise manner, to realize an immediate computation for the currently available input data. We observe that most computations of some CNN kernels in deep layers can be completed in multiple cycles, while not affecting the overall computational latency. Thus, we present a multi-cycle scheme to conduct the column-wise convolutional operations to reduce the hardware resource and power consumption. We present hardware architecture for the multi-cycle scheme as a domain-specific CNN architecture, which is then implemented in a 65 nm technology. We prove our proposed approach realizes up to 8.45%, 49.41% and 50.64% power reductions for LeNet, AlexNet and VGG16, respectively. The experimental results show that our approach tends to cause a larger power reduction for the CNN models with greater depth, larger kernels and more channels. MDPI 2023-02-11 /pmc/articles/PMC9965634/ /pubmed/36850642 http://dx.doi.org/10.3390/s23042045 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Liu, Xinyu Cao, Chenhong Duan, Shengyu A Low-Power Hardware Architecture for Real-Time CNN Computing |
title | A Low-Power Hardware Architecture for Real-Time CNN Computing |
title_full | A Low-Power Hardware Architecture for Real-Time CNN Computing |
title_fullStr | A Low-Power Hardware Architecture for Real-Time CNN Computing |
title_full_unstemmed | A Low-Power Hardware Architecture for Real-Time CNN Computing |
title_short | A Low-Power Hardware Architecture for Real-Time CNN Computing |
title_sort | low-power hardware architecture for real-time cnn computing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9965634/ https://www.ncbi.nlm.nih.gov/pubmed/36850642 http://dx.doi.org/10.3390/s23042045 |
work_keys_str_mv | AT liuxinyu alowpowerhardwarearchitectureforrealtimecnncomputing AT caochenhong alowpowerhardwarearchitectureforrealtimecnncomputing AT duanshengyu alowpowerhardwarearchitectureforrealtimecnncomputing AT liuxinyu lowpowerhardwarearchitectureforrealtimecnncomputing AT caochenhong lowpowerhardwarearchitectureforrealtimecnncomputing AT duanshengyu lowpowerhardwarearchitectureforrealtimecnncomputing |