Cargando…

Sound source localization based on residual network and channel attention module

This paper presents a sound source localization (SSL) model based on residual network and channel attention mechanism. The method takes the combination of log-Mel spectrogram and generalized cross-correlation phase transform (GCC-PHAT) as the input features, and extracts the time–frequency informati...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Fucai, Song, Xiaohui, He, Ruhan, Yu, Yongsheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070247/
https://www.ncbi.nlm.nih.gov/pubmed/37012391
http://dx.doi.org/10.1038/s41598-023-32657-7
_version_ 1785018985872359424
author Hu, Fucai
Song, Xiaohui
He, Ruhan
Yu, Yongsheng
author_facet Hu, Fucai
Song, Xiaohui
He, Ruhan
Yu, Yongsheng
author_sort Hu, Fucai
collection PubMed
description This paper presents a sound source localization (SSL) model based on residual network and channel attention mechanism. The method takes the combination of log-Mel spectrogram and generalized cross-correlation phase transform (GCC-PHAT) as the input features, and extracts the time–frequency information by using the residual structure and channel attention mechanism, thus obtaining a better localizing performance. The residual blocks are introduced to extract deeper features, which can stack more layers for high-level features and avoid gradient vanishing or exploding at the same time. The attention mechanism is taken into account for the feature extraction stage in the proposed SSL model, which can focus on the most important information on the input features. We use the signals collected by microphone array to explore the performance of the model under different features, and find the most suitable input features of the proposed method. We compare our method with other models on public dataset. Experience results show a quite substantial improvement of sound source localizing performance.
format Online
Article
Text
id pubmed-10070247
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-100702472023-04-05 Sound source localization based on residual network and channel attention module Hu, Fucai Song, Xiaohui He, Ruhan Yu, Yongsheng Sci Rep Article This paper presents a sound source localization (SSL) model based on residual network and channel attention mechanism. The method takes the combination of log-Mel spectrogram and generalized cross-correlation phase transform (GCC-PHAT) as the input features, and extracts the time–frequency information by using the residual structure and channel attention mechanism, thus obtaining a better localizing performance. The residual blocks are introduced to extract deeper features, which can stack more layers for high-level features and avoid gradient vanishing or exploding at the same time. The attention mechanism is taken into account for the feature extraction stage in the proposed SSL model, which can focus on the most important information on the input features. We use the signals collected by microphone array to explore the performance of the model under different features, and find the most suitable input features of the proposed method. We compare our method with other models on public dataset. Experience results show a quite substantial improvement of sound source localizing performance. Nature Publishing Group UK 2023-04-03 /pmc/articles/PMC10070247/ /pubmed/37012391 http://dx.doi.org/10.1038/s41598-023-32657-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Hu, Fucai
Song, Xiaohui
He, Ruhan
Yu, Yongsheng
Sound source localization based on residual network and channel attention module
title Sound source localization based on residual network and channel attention module
title_full Sound source localization based on residual network and channel attention module
title_fullStr Sound source localization based on residual network and channel attention module
title_full_unstemmed Sound source localization based on residual network and channel attention module
title_short Sound source localization based on residual network and channel attention module
title_sort sound source localization based on residual network and channel attention module
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070247/
https://www.ncbi.nlm.nih.gov/pubmed/37012391
http://dx.doi.org/10.1038/s41598-023-32657-7
work_keys_str_mv AT hufucai soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule
AT songxiaohui soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule
AT heruhan soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule
AT yuyongsheng soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule