Cargando…

Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks

Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by...

Descripción completa

Detalles Bibliográficos
Autores principales: Chun, Chanjun, Jeon, Kwang Myung, Choi, Wooyeol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374402/
https://www.ncbi.nlm.nih.gov/pubmed/32635619
http://dx.doi.org/10.3390/s20133768
_version_ 1783561690852360192
author Chun, Chanjun
Jeon, Kwang Myung
Choi, Wooyeol
author_facet Chun, Chanjun
Jeon, Kwang Myung
Choi, Wooyeol
author_sort Chun, Chanjun
collection PubMed
description Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods.
format Online
Article
Text
id pubmed-7374402
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-73744022020-08-06 Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks Chun, Chanjun Jeon, Kwang Myung Choi, Wooyeol Sensors (Basel) Letter Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods. MDPI 2020-07-05 /pmc/articles/PMC7374402/ /pubmed/32635619 http://dx.doi.org/10.3390/s20133768 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Letter
Chun, Chanjun
Jeon, Kwang Myung
Choi, Wooyeol
Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
title Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
title_full Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
title_fullStr Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
title_full_unstemmed Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
title_short Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
title_sort configuration-invariant sound localization technique using azimuth-frequency representation and convolutional neural networks
topic Letter
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374402/
https://www.ncbi.nlm.nih.gov/pubmed/32635619
http://dx.doi.org/10.3390/s20133768
work_keys_str_mv AT chunchanjun configurationinvariantsoundlocalizationtechniqueusingazimuthfrequencyrepresentationandconvolutionalneuralnetworks
AT jeonkwangmyung configurationinvariantsoundlocalizationtechniqueusingazimuthfrequencyrepresentationandconvolutionalneuralnetworks
AT choiwooyeol configurationinvariantsoundlocalizationtechniqueusingazimuthfrequencyrepresentationandconvolutionalneuralnetworks