Cargando…

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signa...

Descripción completa

Detalles Bibliográficos
Autores principales: Vera-Diaz, Juan Manuel, Pizarro, Daniel, Macias-Guarasa, Javier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6210564/
https://www.ncbi.nlm.nih.gov/pubmed/30322007
http://dx.doi.org/10.3390/s18103418
_version_ 1783367144771158016
author Vera-Diaz, Juan Manuel
Pizarro, Daniel
Macias-Guarasa, Javier
author_facet Vera-Diaz, Juan Manuel
Pizarro, Daniel
Macias-Guarasa, Javier
author_sort Vera-Diaz, Juan Manuel
collection PubMed
description This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first train our network using semi-synthetic data generated from close talk speech recordings. We simulate the time delays and distortion suffered in the signal that propagate from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of our CNN method does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used.
format Online
Article
Text
id pubmed-6210564
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-62105642018-11-02 Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates Vera-Diaz, Juan Manuel Pizarro, Daniel Macias-Guarasa, Javier Sensors (Basel) Article This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first train our network using semi-synthetic data generated from close talk speech recordings. We simulate the time delays and distortion suffered in the signal that propagate from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of our CNN method does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used. MDPI 2018-10-12 /pmc/articles/PMC6210564/ /pubmed/30322007 http://dx.doi.org/10.3390/s18103418 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vera-Diaz, Juan Manuel
Pizarro, Daniel
Macias-Guarasa, Javier
Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
title Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
title_full Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
title_fullStr Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
title_full_unstemmed Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
title_short Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
title_sort towards end-to-end acoustic localization using deep learning: from audio signals to source position coordinates
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6210564/
https://www.ncbi.nlm.nih.gov/pubmed/30322007
http://dx.doi.org/10.3390/s18103418
work_keys_str_mv AT veradiazjuanmanuel towardsendtoendacousticlocalizationusingdeeplearningfromaudiosignalstosourcepositioncoordinates
AT pizarrodaniel towardsendtoendacousticlocalizationusingdeeplearningfromaudiosignalstosourcepositioncoordinates
AT maciasguarasajavier towardsendtoendacousticlocalizationusingdeeplearningfromaudiosignalstosourcepositioncoordinates