Cargando…
Sound Source Localization Using a Convolutional Neural Network and Regression Model
In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659937/ https://www.ncbi.nlm.nih.gov/pubmed/34884042 http://dx.doi.org/10.3390/s21238031 |
_version_ | 1784613082306183168 |
---|---|
author | Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad |
author_facet | Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad |
author_sort | Tan, Tan-Hsu |
collection | PubMed |
description | In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications. |
format | Online Article Text |
id | pubmed-8659937 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-86599372021-12-10 Sound Source Localization Using a Convolutional Neural Network and Regression Model Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad Sensors (Basel) Article In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications. MDPI 2021-12-01 /pmc/articles/PMC8659937/ /pubmed/34884042 http://dx.doi.org/10.3390/s21238031 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Tan, Tan-Hsu Lin, Yu-Tang Chang, Yang-Lang Alkhaleefah, Mohammad Sound Source Localization Using a Convolutional Neural Network and Regression Model |
title | Sound Source Localization Using a Convolutional Neural Network and Regression Model |
title_full | Sound Source Localization Using a Convolutional Neural Network and Regression Model |
title_fullStr | Sound Source Localization Using a Convolutional Neural Network and Regression Model |
title_full_unstemmed | Sound Source Localization Using a Convolutional Neural Network and Regression Model |
title_short | Sound Source Localization Using a Convolutional Neural Network and Regression Model |
title_sort | sound source localization using a convolutional neural network and regression model |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659937/ https://www.ncbi.nlm.nih.gov/pubmed/34884042 http://dx.doi.org/10.3390/s21238031 |
work_keys_str_mv | AT tantanhsu soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel AT linyutang soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel AT changyanglang soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel AT alkhaleefahmohammad soundsourcelocalizationusingaconvolutionalneuralnetworkandregressionmodel |