Cargando…
Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks
Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374402/ https://www.ncbi.nlm.nih.gov/pubmed/32635619 http://dx.doi.org/10.3390/s20133768 |
_version_ | 1783561690852360192 |
---|---|
author | Chun, Chanjun Jeon, Kwang Myung Choi, Wooyeol |
author_facet | Chun, Chanjun Jeon, Kwang Myung Choi, Wooyeol |
author_sort | Chun, Chanjun |
collection | PubMed |
description | Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods. |
format | Online Article Text |
id | pubmed-7374402 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-73744022020-08-06 Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks Chun, Chanjun Jeon, Kwang Myung Choi, Wooyeol Sensors (Basel) Letter Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods. MDPI 2020-07-05 /pmc/articles/PMC7374402/ /pubmed/32635619 http://dx.doi.org/10.3390/s20133768 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Letter Chun, Chanjun Jeon, Kwang Myung Choi, Wooyeol Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks |
title | Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks |
title_full | Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks |
title_fullStr | Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks |
title_full_unstemmed | Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks |
title_short | Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks |
title_sort | configuration-invariant sound localization technique using azimuth-frequency representation and convolutional neural networks |
topic | Letter |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374402/ https://www.ncbi.nlm.nih.gov/pubmed/32635619 http://dx.doi.org/10.3390/s20133768 |
work_keys_str_mv | AT chunchanjun configurationinvariantsoundlocalizationtechniqueusingazimuthfrequencyrepresentationandconvolutionalneuralnetworks AT jeonkwangmyung configurationinvariantsoundlocalizationtechniqueusingazimuthfrequencyrepresentationandconvolutionalneuralnetworks AT choiwooyeol configurationinvariantsoundlocalizationtechniqueusingazimuthfrequencyrepresentationandconvolutionalneuralnetworks |