Cargando…

Sound Source Localization Using a Convolutional Neural Network and Regression Model

In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of t...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Tan-Hsu, Lin, Yu-Tang, Chang, Yang-Lang, Alkhaleefah, Mohammad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659937/
https://www.ncbi.nlm.nih.gov/pubmed/34884042
http://dx.doi.org/10.3390/s21238031
_version_ 1784613082306183168
author Tan, Tan-Hsu
Lin, Yu-Tang
Chang, Yang-Lang
Alkhaleefah, Mohammad
author_facet Tan, Tan-Hsu
Lin, Yu-Tang
Chang, Yang-Lang
Alkhaleefah, Mohammad
author_sort Tan, Tan-Hsu
collection PubMed
description In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications.
format Online
Article
Text
id pubmed-8659937
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86599372021-12-10 Sound Source Localization Using a Convolutional Neural Network and Regression Model Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad Sensors (Basel) Article In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications. MDPI 2021-12-01 /pmc/articles/PMC8659937/ /pubmed/34884042 http://dx.doi.org/10.3390/s21238031 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Tan, Tan-Hsu
Lin, Yu-Tang
Chang, Yang-Lang
Alkhaleefah, Mohammad
Sound Source Localization Using a Convolutional Neural Network and Regression Model
title Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_full Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_fullStr Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_full_unstemmed Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_short Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_sort sound source localization using a convolutional neural network and regression model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659937/
https://www.ncbi.nlm.nih.gov/pubmed/34884042
http://dx.doi.org/10.3390/s21238031
work_keys_str_mv AT tantanhsu soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel
AT linyutang soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel
AT changyanglang soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel
AT alkhaleefahmohammad soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel