Cargando…

Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation

In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-...

Descripción completa

Detalles Bibliográficos
Autores principales: HAO, YIYA, KÜÇÜK, ABDULLAH, GANGULY, ANSHUMAN, PANAHI, ISSA M. S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112575/
https://www.ncbi.nlm.nih.gov/pubmed/33981519
http://dx.doi.org/10.1109/access.2020.3033533
_version_ 1783690701228212224
author HAO, YIYA
KÜÇÜK, ABDULLAH
GANGULY, ANSHUMAN
PANAHI, ISSA M. S.
author_facet HAO, YIYA
KÜÇÜK, ABDULLAH
GANGULY, ANSHUMAN
PANAHI, ISSA M. S.
author_sort HAO, YIYA
collection PubMed
description In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-time operation. We have used the combination of the imaginary-real coefficients of the short-time Fourier transform (STFT) and Spectral Flux (SF) with delay-and-sum (DAS) beamforming as the input feature. We have trained the CNN model using noisy speech recordings collected from different rooms and inference on an unseen room. We provide quantitative comparison with five other previously published SSL algorithms under several realistic noisy conditions, and show significant improvements by incorporating the Spectral Flux (SF) with beamforming as an additional feature to learn temporal variation in speech spectra. We perform real-time inferencing of our CNN model on the prototyped platform with low latency (21 milliseconds (ms) per frame with a frame length of 30 ms) and high accuracy (i.e. 89.68% under Babble noise condition at 5dB SNR). Lastly, we provide a detailed explanation of real-time implementation and on-device performance (including peak power consumption metrics) that sets this work apart from previously published works. This work has several notable implications for improving the audio-processing algorithms for portable battery-operated Smart loudspeakers and hearing improvement (HI) devices.
format Online
Article
Text
id pubmed-8112575
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-81125752021-05-11 Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation HAO, YIYA KÜÇÜK, ABDULLAH GANGULY, ANSHUMAN PANAHI, ISSA M. S. IEEE Access Article In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-time operation. We have used the combination of the imaginary-real coefficients of the short-time Fourier transform (STFT) and Spectral Flux (SF) with delay-and-sum (DAS) beamforming as the input feature. We have trained the CNN model using noisy speech recordings collected from different rooms and inference on an unseen room. We provide quantitative comparison with five other previously published SSL algorithms under several realistic noisy conditions, and show significant improvements by incorporating the Spectral Flux (SF) with beamforming as an additional feature to learn temporal variation in speech spectra. We perform real-time inferencing of our CNN model on the prototyped platform with low latency (21 milliseconds (ms) per frame with a frame length of 30 ms) and high accuracy (i.e. 89.68% under Babble noise condition at 5dB SNR). Lastly, we provide a detailed explanation of real-time implementation and on-device performance (including peak power consumption metrics) that sets this work apart from previously published works. This work has several notable implications for improving the audio-processing algorithms for portable battery-operated Smart loudspeakers and hearing improvement (HI) devices. 2020-10-26 2020 /pmc/articles/PMC8112575/ /pubmed/33981519 http://dx.doi.org/10.1109/access.2020.3033533 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle Article
HAO, YIYA
KÜÇÜK, ABDULLAH
GANGULY, ANSHUMAN
PANAHI, ISSA M. S.
Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_full Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_fullStr Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_full_unstemmed Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_short Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_sort spectral flux-based convolutional neural network architecture for speech source localization and its real-time implementation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112575/
https://www.ncbi.nlm.nih.gov/pubmed/33981519
http://dx.doi.org/10.1109/access.2020.3033533
work_keys_str_mv AT haoyiya spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation
AT kucukabdullah spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation
AT gangulyanshuman spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation
AT panahiissams spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation