Cargando…
Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices
While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vani...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645649/ https://www.ncbi.nlm.nih.gov/pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811 |
_version_ | 1784610353388191744 |
---|---|
author | Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. |
author_facet | Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. |
author_sort | Zhao, Junyun |
collection | PubMed |
description | While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators. |
format | Online Article Text |
id | pubmed-8645649 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-86456492021-12-07 Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. Front Neurosci Neuroscience While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators. Frontiers Media S.A. 2021-11-22 /pmc/articles/PMC8645649/ /pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811 Text en Copyright © 2021 Zhao, Huang, Yousuf, Gao, Hoskins and Adam. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Zhao, Junyun Huang, Siyuan Yousuf, Osama Gao, Yutong Hoskins, Brian D. Adam, Gina C. Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_full | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_fullStr | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_full_unstemmed | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_short | Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices |
title_sort | gradient decomposition methods for training neural networks with non-ideal synaptic devices |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645649/ https://www.ncbi.nlm.nih.gov/pubmed/34880721 http://dx.doi.org/10.3389/fnins.2021.749811 |
work_keys_str_mv | AT zhaojunyun gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT huangsiyuan gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT yousufosama gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT gaoyutong gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT hoskinsbriand gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices AT adamginac gradientdecompositionmethodsfortrainingneuralnetworkswithnonidealsynapticdevices |