Cargando…

An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost

The computation efficiency and flexibility of the accelerator hinder deep neural network (DNN) implementation in embedded applications. Although there are many publications on deep neural network (DNN) processors, there is still much room for deep optimization to further improve results. Multiple di...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gao, Muxuan, Chen, He, Liu, Dake
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9146143/ https://www.ncbi.nlm.nih.gov/pubmed/35632250 http://dx.doi.org/10.3390/s22103841

_version_	1784716488000667648
author	Gao, Muxuan Chen, He Liu, Dake
author_facet	Gao, Muxuan Chen, He Liu, Dake
author_sort	Gao, Muxuan
collection	PubMed
description	The computation efficiency and flexibility of the accelerator hinder deep neural network (DNN) implementation in embedded applications. Although there are many publications on deep neural network (DNN) processors, there is still much room for deep optimization to further improve results. Multiple dimensions must be simultaneously considered when designing a DNN processor to reach the performance limit of the architecture, including architecture decision, flexibility, energy efficiency, and silicon cost minimization. Flexibility is defined as the ability to support as many multiple networks as possible and to easily adjust the scale. For energy efficiency, there are huge opportunities for power efficiency optimization, which involves access minimization and memory latency minimization based on on-chip memory minimization. Therefore, this work focused on low-power and low-latency data access with minimized silicon cost. This research was implemented based on an ASIP (application specific instruction set processor) in which an ISA was based on the caffe2 inference operator and the hardware design was based on a single instruction multiple data (SIMD) architecture. The scalability and system performance of our SoC extension scheme were demonstrated. The VLIW was used to execute multiple instructions in parallel. All costs for data access time were thus eliminated for the convolution layer. Finally, the processor was synthesized based on TSMC 65 nm technology with a 200 MHz clock, and the Soc extension scheme was analyzed in an experimental model. Our design was tested on several typical neural networks, achieving 196 GOPS at 200 MHz and 241 GOPS/W on the VGG16Net and AlexNet.
format	Online Article Text
id	pubmed-9146143
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-91461432022-05-29 An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost Gao, Muxuan Chen, He Liu, Dake Sensors (Basel) Article The computation efficiency and flexibility of the accelerator hinder deep neural network (DNN) implementation in embedded applications. Although there are many publications on deep neural network (DNN) processors, there is still much room for deep optimization to further improve results. Multiple dimensions must be simultaneously considered when designing a DNN processor to reach the performance limit of the architecture, including architecture decision, flexibility, energy efficiency, and silicon cost minimization. Flexibility is defined as the ability to support as many multiple networks as possible and to easily adjust the scale. For energy efficiency, there are huge opportunities for power efficiency optimization, which involves access minimization and memory latency minimization based on on-chip memory minimization. Therefore, this work focused on low-power and low-latency data access with minimized silicon cost. This research was implemented based on an ASIP (application specific instruction set processor) in which an ISA was based on the caffe2 inference operator and the hardware design was based on a single instruction multiple data (SIMD) architecture. The scalability and system performance of our SoC extension scheme were demonstrated. The VLIW was used to execute multiple instructions in parallel. All costs for data access time were thus eliminated for the convolution layer. Finally, the processor was synthesized based on TSMC 65 nm technology with a 200 MHz clock, and the Soc extension scheme was analyzed in an experimental model. Our design was tested on several typical neural networks, achieving 196 GOPS at 200 MHz and 241 GOPS/W on the VGG16Net and AlexNet. MDPI 2022-05-19 /pmc/articles/PMC9146143/ /pubmed/35632250 http://dx.doi.org/10.3390/s22103841 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Gao, Muxuan Chen, He Liu, Dake An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost
title	An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost
title_full	An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost
title_fullStr	An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost
title_full_unstemmed	An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost
title_short	An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost
title_sort	asip for neural network inference on embedded devices with 99% pe utilization and 100% memory hidden under low silicon cost
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9146143/ https://www.ncbi.nlm.nih.gov/pubmed/35632250 http://dx.doi.org/10.3390/s22103841
work_keys_str_mv	AT gaomuxuan anasipforneuralnetworkinferenceonembeddeddeviceswith99peutilizationand100memoryhiddenunderlowsiliconcost AT chenhe anasipforneuralnetworkinferenceonembeddeddeviceswith99peutilizationand100memoryhiddenunderlowsiliconcost AT liudake anasipforneuralnetworkinferenceonembeddeddeviceswith99peutilizationand100memoryhiddenunderlowsiliconcost AT gaomuxuan asipforneuralnetworkinferenceonembeddeddeviceswith99peutilizationand100memoryhiddenunderlowsiliconcost AT chenhe asipforneuralnetworkinferenceonembeddeddeviceswith99peutilizationand100memoryhiddenunderlowsiliconcost AT liudake asipforneuralnetworkinferenceonembeddeddeviceswith99peutilizationand100memoryhiddenunderlowsiliconcost

An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost

Ejemplares similares