Cargando…

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) close...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Jibin, Yılmaz, Emre, Zhang, Malu, Li, Haizhou, Tan, Kay Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7090229/
https://www.ncbi.nlm.nih.gov/pubmed/32256308
http://dx.doi.org/10.3389/fnins.2020.00199
_version_ 1783509890434596864
author Wu, Jibin
Yılmaz, Emre
Zhang, Malu
Li, Haizhou
Tan, Kay Chen
author_facet Wu, Jibin
Yılmaz, Emre
Zhang, Malu
Li, Haizhou
Tan, Kay Chen
author_sort Wu, Jibin
collection PubMed
description Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energy-efficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require only 10 algorithmic time steps and as low as 0.68 times total synaptic operations to classify each audio frame. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices.
format Online
Article
Text
id pubmed-7090229
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-70902292020-03-31 Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition Wu, Jibin Yılmaz, Emre Zhang, Malu Li, Haizhou Tan, Kay Chen Front Neurosci Neuroscience Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energy-efficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require only 10 algorithmic time steps and as low as 0.68 times total synaptic operations to classify each audio frame. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices. Frontiers Media S.A. 2020-03-17 /pmc/articles/PMC7090229/ /pubmed/32256308 http://dx.doi.org/10.3389/fnins.2020.00199 Text en Copyright © 2020 Wu, Yılmaz, Zhang, Li and Tan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Wu, Jibin
Yılmaz, Emre
Zhang, Malu
Li, Haizhou
Tan, Kay Chen
Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
title Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
title_full Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
title_fullStr Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
title_full_unstemmed Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
title_short Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
title_sort deep spiking neural networks for large vocabulary automatic speech recognition
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7090229/
https://www.ncbi.nlm.nih.gov/pubmed/32256308
http://dx.doi.org/10.3389/fnins.2020.00199
work_keys_str_mv AT wujibin deepspikingneuralnetworksforlargevocabularyautomaticspeechrecognition
AT yılmazemre deepspikingneuralnetworksforlargevocabularyautomaticspeechrecognition
AT zhangmalu deepspikingneuralnetworksforlargevocabularyautomaticspeechrecognition
AT lihaizhou deepspikingneuralnetworksforlargevocabularyautomaticspeechrecognition
AT tankaychen deepspikingneuralnetworksforlargevocabularyautomaticspeechrecognition