Cargando…

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors

The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power...

Descripción completa

Detalles Bibliográficos
Autores principales:	Junaid, Muhammad, Arslan, Saad, Lee, TaeGeon, Kim, HyungWon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8840430/ https://www.ncbi.nlm.nih.gov/pubmed/35161975 http://dx.doi.org/10.3390/s22031230

_version_	1784650617530089472
author	Junaid, Muhammad Arslan, Saad Lee, TaeGeon Kim, HyungWon
author_facet	Junaid, Muhammad Arslan, Saad Lee, TaeGeon Kim, HyungWon
author_sort	Junaid, Muhammad
collection	PubMed
description	The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 × 1.036 mm(2) and energy consumption of 4.445 µJ per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy.
format	Online Article Text
id	pubmed-8840430
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-88404302022-02-13 Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors Junaid, Muhammad Arslan, Saad Lee, TaeGeon Kim, HyungWon Sensors (Basel) Article The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 × 1.036 mm(2) and energy consumption of 4.445 µJ per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy. MDPI 2022-02-06 /pmc/articles/PMC8840430/ /pubmed/35161975 http://dx.doi.org/10.3390/s22031230 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Junaid, Muhammad Arslan, Saad Lee, TaeGeon Kim, HyungWon Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
title	Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
title_full	Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
title_fullStr	Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
title_full_unstemmed	Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
title_short	Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors
title_sort	optimal architecture of floating-point arithmetic for neural network training processors
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8840430/ https://www.ncbi.nlm.nih.gov/pubmed/35161975 http://dx.doi.org/10.3390/s22031230
work_keys_str_mv	AT junaidmuhammad optimalarchitectureoffloatingpointarithmeticforneuralnetworktrainingprocessors AT arslansaad optimalarchitectureoffloatingpointarithmeticforneuralnetworktrainingprocessors AT leetaegeon optimalarchitectureoffloatingpointarithmeticforneuralnetworktrainingprocessors AT kimhyungwon optimalarchitectureoffloatingpointarithmeticforneuralnetworktrainingprocessors

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors

Ejemplares similares