Cargando…

Sound Source Localization Using a Convolutional Neural Network and Regression Model

In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tan, Tan-Hsu, Lin, Yu-Tang, Chang, Yang-Lang, Alkhaleefah, Mohammad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659937/ https://www.ncbi.nlm.nih.gov/pubmed/34884042 http://dx.doi.org/10.3390/s21238031

_version_	1784613082306183168
author	Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad
author_facet	Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad
author_sort	Tan, Tan-Hsu
collection	PubMed
description	In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications.
format	Online Article Text
id	pubmed-8659937
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-86599372021-12-10 Sound Source Localization Using a Convolutional Neural Network and Regression Model Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad Sensors (Basel) Article In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications. MDPI 2021-12-01 /pmc/articles/PMC8659937/ /pubmed/34884042 http://dx.doi.org/10.3390/s21238031 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad Sound Source Localization Using a Convolutional Neural Network and Regression Model
title	Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_full	Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_fullStr	Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_full_unstemmed	Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_short	Sound Source Localization Using a Convolutional Neural Network and Regression Model
title_sort	sound source localization using a convolutional neural network and regression model
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659937/ https://www.ncbi.nlm.nih.gov/pubmed/34884042 http://dx.doi.org/10.3390/s21238031
work_keys_str_mv	AT tantanhsu soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel AT linyutang soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel AT changyanglang soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel AT alkhaleefahmohammad soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel

Sound Source Localization Using a Convolutional Neural Network and Regression Model

Ejemplares similares