Cargando…

Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios

Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel meth...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dehghan Firoozabadi, Ali, Adasme, Pablo, Zabala-Blanco, David, Palacios Játiva, Pablo, Azurdia-Meza, Cesar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181562/ https://www.ncbi.nlm.nih.gov/pubmed/37177702 http://dx.doi.org/10.3390/s23094499

_version_	1785041603832840192
author	Dehghan Firoozabadi, Ali Adasme, Pablo Zabala-Blanco, David Palacios Játiva, Pablo Azurdia-Meza, Cesar
author_facet	Dehghan Firoozabadi, Ali Adasme, Pablo Zabala-Blanco, David Palacios Játiva, Pablo Azurdia-Meza, Cesar
author_sort	Dehghan Firoozabadi, Ali
collection	PubMed
description	Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation.
format	Online Article Text
id	pubmed-10181562
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101815622023-05-13 Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios Dehghan Firoozabadi, Ali Adasme, Pablo Zabala-Blanco, David Palacios Játiva, Pablo Azurdia-Meza, Cesar Sensors (Basel) Article Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation. MDPI 2023-05-05 /pmc/articles/PMC10181562/ /pubmed/37177702 http://dx.doi.org/10.3390/s23094499 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Dehghan Firoozabadi, Ali Adasme, Pablo Zabala-Blanco, David Palacios Játiva, Pablo Azurdia-Meza, Cesar Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_full	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_fullStr	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_full_unstemmed	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_short	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_sort	speaker counting based on a novel hive shaped nested microphone array by wpt and 2d adaptive srp algorithms in near-field scenarios
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181562/ https://www.ncbi.nlm.nih.gov/pubmed/37177702 http://dx.doi.org/10.3390/s23094499
work_keys_str_mv	AT dehghanfiroozabadiali speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT adasmepablo speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT zabalablancodavid speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT palaciosjativapablo speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT azurdiamezacesar speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios

Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios

Ejemplares similares