Cargando…

Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation

In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-...

Descripción completa

Detalles Bibliográficos
Autores principales:	HAO, YIYA, KÜÇÜK, ABDULLAH, GANGULY, ANSHUMAN, PANAHI, ISSA M. S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112575/ https://www.ncbi.nlm.nih.gov/pubmed/33981519 http://dx.doi.org/10.1109/access.2020.3033533

_version_	1783690701228212224
author	HAO, YIYA KÜÇÜK, ABDULLAH GANGULY, ANSHUMAN PANAHI, ISSA M. S.
author_facet	HAO, YIYA KÜÇÜK, ABDULLAH GANGULY, ANSHUMAN PANAHI, ISSA M. S.
author_sort	HAO, YIYA
collection	PubMed
description	In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-time operation. We have used the combination of the imaginary-real coefficients of the short-time Fourier transform (STFT) and Spectral Flux (SF) with delay-and-sum (DAS) beamforming as the input feature. We have trained the CNN model using noisy speech recordings collected from different rooms and inference on an unseen room. We provide quantitative comparison with five other previously published SSL algorithms under several realistic noisy conditions, and show significant improvements by incorporating the Spectral Flux (SF) with beamforming as an additional feature to learn temporal variation in speech spectra. We perform real-time inferencing of our CNN model on the prototyped platform with low latency (21 milliseconds (ms) per frame with a frame length of 30 ms) and high accuracy (i.e. 89.68% under Babble noise condition at 5dB SNR). Lastly, we provide a detailed explanation of real-time implementation and on-device performance (including peak power consumption metrics) that sets this work apart from previously published works. This work has several notable implications for improving the audio-processing algorithms for portable battery-operated Smart loudspeakers and hearing improvement (HI) devices.
format	Online Article Text
id	pubmed-8112575
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-81125752021-05-11 Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation HAO, YIYA KÜÇÜK, ABDULLAH GANGULY, ANSHUMAN PANAHI, ISSA M. S. IEEE Access Article In this article, we present a real-time convolutional neural network (CNN)-based Speech source localization (SSL) algorithm that is robust to realistic background acoustic conditions (noise and reverberation). We have implemented and tested the proposed method on a prototype (Raspberry Pi) for real-time operation. We have used the combination of the imaginary-real coefficients of the short-time Fourier transform (STFT) and Spectral Flux (SF) with delay-and-sum (DAS) beamforming as the input feature. We have trained the CNN model using noisy speech recordings collected from different rooms and inference on an unseen room. We provide quantitative comparison with five other previously published SSL algorithms under several realistic noisy conditions, and show significant improvements by incorporating the Spectral Flux (SF) with beamforming as an additional feature to learn temporal variation in speech spectra. We perform real-time inferencing of our CNN model on the prototyped platform with low latency (21 milliseconds (ms) per frame with a frame length of 30 ms) and high accuracy (i.e. 89.68% under Babble noise condition at 5dB SNR). Lastly, we provide a detailed explanation of real-time implementation and on-device performance (including peak power consumption metrics) that sets this work apart from previously published works. This work has several notable implications for improving the audio-processing algorithms for portable battery-operated Smart loudspeakers and hearing improvement (HI) devices. 2020-10-26 2020 /pmc/articles/PMC8112575/ /pubmed/33981519 http://dx.doi.org/10.1109/access.2020.3033533 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle	Article HAO, YIYA KÜÇÜK, ABDULLAH GANGULY, ANSHUMAN PANAHI, ISSA M. S. Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title	Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_full	Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_fullStr	Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_full_unstemmed	Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_short	Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation
title_sort	spectral flux-based convolutional neural network architecture for speech source localization and its real-time implementation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112575/ https://www.ncbi.nlm.nih.gov/pubmed/33981519 http://dx.doi.org/10.1109/access.2020.3033533
work_keys_str_mv	AT haoyiya spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation AT kucukabdullah spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation AT gangulyanshuman spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation AT panahiissams spectralfluxbasedconvolutionalneuralnetworkarchitectureforspeechsourcelocalizationanditsrealtimeimplementation

Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and Its Real-Time Implementation

Ejemplares similares