Cargando…

Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues

Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manner...

Descripción completa

Detalles Bibliográficos
Autores principales: Hwang, Soojoong, Jin, Yu Gwang, Shin, Jong Won
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6678508/
https://www.ncbi.nlm.nih.gov/pubmed/31373308
http://dx.doi.org/10.3390/s19143056
_version_ 1783441117653499904
author Hwang, Soojoong
Jin, Yu Gwang
Shin, Jong Won
author_facet Hwang, Soojoong
Jin, Yu Gwang
Shin, Jong Won
author_sort Hwang, Soojoong
collection PubMed
description Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manners by background noises and reverberation and therefore can have complementary information. Conventional approaches utilize the statistics from all frequencies with fixed weight, although the information from some time–frequency bins may degrade the performance of VAD. In this letter, we propose a dual microphone VAD scheme based on the spatial cues in reliable frequency bins only, considering the sparsity of the speech signal in the time–frequency domain. The reliability of each time–frequency bin is determined by three conditions on signal energy, ILD, and ITD. ITD-based and ILD-based VADs and statistics are evaluated using the information from selected frequency bins and then combined to produce the final VAD results. Experimental results show that the proposed frequency selective approach enhances the performances of VAD in realistic environments.
format Online
Article
Text
id pubmed-6678508
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-66785082019-08-19 Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues Hwang, Soojoong Jin, Yu Gwang Shin, Jong Won Sensors (Basel) Article Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manners by background noises and reverberation and therefore can have complementary information. Conventional approaches utilize the statistics from all frequencies with fixed weight, although the information from some time–frequency bins may degrade the performance of VAD. In this letter, we propose a dual microphone VAD scheme based on the spatial cues in reliable frequency bins only, considering the sparsity of the speech signal in the time–frequency domain. The reliability of each time–frequency bin is determined by three conditions on signal energy, ILD, and ITD. ITD-based and ILD-based VADs and statistics are evaluated using the information from selected frequency bins and then combined to produce the final VAD results. Experimental results show that the proposed frequency selective approach enhances the performances of VAD in realistic environments. MDPI 2019-07-11 /pmc/articles/PMC6678508/ /pubmed/31373308 http://dx.doi.org/10.3390/s19143056 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hwang, Soojoong
Jin, Yu Gwang
Shin, Jong Won
Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues
title Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues
title_full Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues
title_fullStr Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues
title_full_unstemmed Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues
title_short Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues
title_sort dual microphone voice activity detection based on reliable spatial cues
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6678508/
https://www.ncbi.nlm.nih.gov/pubmed/31373308
http://dx.doi.org/10.3390/s19143056
work_keys_str_mv AT hwangsoojoong dualmicrophonevoiceactivitydetectionbasedonreliablespatialcues
AT jinyugwang dualmicrophonevoiceactivitydetectionbasedonreliablespatialcues
AT shinjongwon dualmicrophonevoiceactivitydetectionbasedonreliablespatialcues