Cargando…

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vani...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Junyun, Huang, Siyuan, Yousuf, Osama, Gao, Yutong, Hoskins, Brian D., Adam, Gina C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645649/
https://www.ncbi.nlm.nih.gov/pubmed/34880721
http://dx.doi.org/10.3389/fnins.2021.749811
_version_ 1784610353388191744
author Zhao, Junyun
Huang, Siyuan
Yousuf, Osama
Gao, Yutong
Hoskins, Brian D.
Adam, Gina C.
author_facet Zhao, Junyun
Huang, Siyuan
Yousuf, Osama
Gao, Yutong
Hoskins, Brian D.
Adam, Gina C.
author_sort Zhao, Junyun
collection PubMed
description While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.
format Online
Article
Text
id pubmed-8645649
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86456492021-12-07 Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. Front Neurosci Neuroscience While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators. Frontiers Media S.A. 2021-11-22 /pmc/articles/PMC8645649/ /pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811 Text en Copyright © 2021 Zhao, Huang, Yousuf, Gao, Hoskins and Adam. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Zhao, Junyun
Huang, Siyuan
Yousuf, Osama
Gao, Yutong
Hoskins, Brian D.
Adam, Gina C.
Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_fullStr Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_full_unstemmed Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_short Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
title_sort gradient decomposition methods for training neural networks with non-ideal synaptic devices
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645649/
https://www.ncbi.nlm.nih.gov/pubmed/34880721
http://dx.doi.org/10.3389/fnins.2021.749811
work_keys_str_mv AT zhaojunyun gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT huangsiyuan gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT yousufosama gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT gaoyutong gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT hoskinsbriand gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices
AT adamginac gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices