Cargando…
Sound source localization based on residual network and channel attention module
This paper presents a sound source localization (SSL) model based on residual network and channel attention mechanism. The method takes the combination of log-Mel spectrogram and generalized cross-correlation phase transform (GCC-PHAT) as the input features, and extracts the time–frequency informati...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070247/ https://www.ncbi.nlm.nih.gov/pubmed/37012391 http://dx.doi.org/10.1038/s41598-023-32657-7 |
_version_ | 1785018985872359424 |
---|---|
author | Hu, Fucai Song, Xiaohui He, Ruhan Yu, Yongsheng |
author_facet | Hu, Fucai Song, Xiaohui He, Ruhan Yu, Yongsheng |
author_sort | Hu, Fucai |
collection | PubMed |
description | This paper presents a sound source localization (SSL) model based on residual network and channel attention mechanism. The method takes the combination of log-Mel spectrogram and generalized cross-correlation phase transform (GCC-PHAT) as the input features, and extracts the time–frequency information by using the residual structure and channel attention mechanism, thus obtaining a better localizing performance. The residual blocks are introduced to extract deeper features, which can stack more layers for high-level features and avoid gradient vanishing or exploding at the same time. The attention mechanism is taken into account for the feature extraction stage in the proposed SSL model, which can focus on the most important information on the input features. We use the signals collected by microphone array to explore the performance of the model under different features, and find the most suitable input features of the proposed method. We compare our method with other models on public dataset. Experience results show a quite substantial improvement of sound source localizing performance. |
format | Online Article Text |
id | pubmed-10070247 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-100702472023-04-05 Sound source localization based on residual network and channel attention module Hu, Fucai Song, Xiaohui He, Ruhan Yu, Yongsheng Sci Rep Article This paper presents a sound source localization (SSL) model based on residual network and channel attention mechanism. The method takes the combination of log-Mel spectrogram and generalized cross-correlation phase transform (GCC-PHAT) as the input features, and extracts the time–frequency information by using the residual structure and channel attention mechanism, thus obtaining a better localizing performance. The residual blocks are introduced to extract deeper features, which can stack more layers for high-level features and avoid gradient vanishing or exploding at the same time. The attention mechanism is taken into account for the feature extraction stage in the proposed SSL model, which can focus on the most important information on the input features. We use the signals collected by microphone array to explore the performance of the model under different features, and find the most suitable input features of the proposed method. We compare our method with other models on public dataset. Experience results show a quite substantial improvement of sound source localizing performance. Nature Publishing Group UK 2023-04-03 /pmc/articles/PMC10070247/ /pubmed/37012391 http://dx.doi.org/10.1038/s41598-023-32657-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Hu, Fucai Song, Xiaohui He, Ruhan Yu, Yongsheng Sound source localization based on residual network and channel attention module |
title | Sound source localization based on residual network and channel attention module |
title_full | Sound source localization based on residual network and channel attention module |
title_fullStr | Sound source localization based on residual network and channel attention module |
title_full_unstemmed | Sound source localization based on residual network and channel attention module |
title_short | Sound source localization based on residual network and channel attention module |
title_sort | sound source localization based on residual network and channel attention module |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070247/ https://www.ncbi.nlm.nih.gov/pubmed/37012391 http://dx.doi.org/10.1038/s41598-023-32657-7 |
work_keys_str_mv | AT hufucai soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule AT songxiaohui soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule AT heruhan soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule AT yuyongsheng soundsourcelocalizationbasedonresidualnetworkandchannelattentionmodule |