Cargando…

Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach

This paper presents a sound source distance estimation (SSDE) method using a convolutional recurrent neural network (CRNN). We approach the sound source distance estimation task as an image classification problem, and we aim to classify a given audio signal into one of three predefined distance clas...

Descripción completa

Detalles Bibliográficos
Autores principales: Yiwere, Mariam, Rhee, Eun Joo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6982911/
https://www.ncbi.nlm.nih.gov/pubmed/31892213
http://dx.doi.org/10.3390/s20010172
_version_ 1783491398021939200
author Yiwere, Mariam
Rhee, Eun Joo
author_facet Yiwere, Mariam
Rhee, Eun Joo
author_sort Yiwere, Mariam
collection PubMed
description This paper presents a sound source distance estimation (SSDE) method using a convolutional recurrent neural network (CRNN). We approach the sound source distance estimation task as an image classification problem, and we aim to classify a given audio signal into one of three predefined distance classes—one meter, two meters, and three meters—irrespective of its orientation angle. For the purpose of training, we create a dataset by recording audio signals at the three different distances and three angles in different rooms. The CRNN is trained using time-frequency representations of the audio signals. Specifically, we transform the audio signals into log-scaled mel spectrograms, allowing the convolutional layers to extract the appropriate features required for the classification. When trained and tested with combined datasets from all rooms, the proposed model exhibits high classification accuracies; however, training and testing the model in separate rooms results in lower accuracies, indicating that further study is required to improve the method’s generalization ability. Our experimental results demonstrate that it is possible to estimate sound source distances in known environments by classification using the log-scaled mel spectrogram.
format Online
Article
Text
id pubmed-6982911
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-69829112020-02-06 Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach Yiwere, Mariam Rhee, Eun Joo Sensors (Basel) Article This paper presents a sound source distance estimation (SSDE) method using a convolutional recurrent neural network (CRNN). We approach the sound source distance estimation task as an image classification problem, and we aim to classify a given audio signal into one of three predefined distance classes—one meter, two meters, and three meters—irrespective of its orientation angle. For the purpose of training, we create a dataset by recording audio signals at the three different distances and three angles in different rooms. The CRNN is trained using time-frequency representations of the audio signals. Specifically, we transform the audio signals into log-scaled mel spectrograms, allowing the convolutional layers to extract the appropriate features required for the classification. When trained and tested with combined datasets from all rooms, the proposed model exhibits high classification accuracies; however, training and testing the model in separate rooms results in lower accuracies, indicating that further study is required to improve the method’s generalization ability. Our experimental results demonstrate that it is possible to estimate sound source distances in known environments by classification using the log-scaled mel spectrogram. MDPI 2019-12-27 /pmc/articles/PMC6982911/ /pubmed/31892213 http://dx.doi.org/10.3390/s20010172 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yiwere, Mariam
Rhee, Eun Joo
Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach
title Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach
title_full Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach
title_fullStr Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach
title_full_unstemmed Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach
title_short Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach
title_sort sound source distance estimation using deep learning: an image classification approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6982911/
https://www.ncbi.nlm.nih.gov/pubmed/31892213
http://dx.doi.org/10.3390/s20010172
work_keys_str_mv AT yiweremariam soundsourcedistanceestimationusingdeeplearninganimageclassificationapproach
AT rheeeunjoo soundsourcedistanceestimationusingdeeplearninganimageclassificationapproach