Cargando…

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vani...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Junyun, Huang, Siyuan, Yousuf, Osama, Gao, Yutong, Hoskins, Brian D., Adam, Gina C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645649/ https://www.ncbi.nlm.nih.gov/pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811

_version_	1784610353388191744
author	Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C.
author_facet	Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C.
author_sort	Zhao, Junyun
collection	PubMed
description	While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.
format	Online Article Text
id	pubmed-8645649
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86456492021-12-07 Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. Front Neurosci Neuroscience While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators. Frontiers Media S.A. 2021-11-22 /pmc/articles/PMC8645649/ /pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811 Text en Copyright © 2021 Zhao, Huang, Yousuf, Gao, Hoskins and Adam. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_fullStr	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full_unstemmed	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_short	Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_sort	gradient decomposition methods for training neural networks with non-ideal synaptic devices
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645649/ https://www.ncbi.nlm.nih.gov/pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811
work_keys_str_mv	AT zhaojunyun gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT huangsiyuan gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT yousufosama gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT gaoyutong gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT hoskinsbriand gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT adamginac gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Ejemplares similares