Cargando…

Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Noh, Kyoungjin, Chang, Joon-Hyuk
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180550/ https://www.ncbi.nlm.nih.gov/pubmed/32231161 http://dx.doi.org/10.3390/s20071883

_version_	1783525845143388160
author	Noh, Kyoungjin Chang, Joon-Hyuk
author_facet	Noh, Kyoungjin Chang, Joon-Hyuk
author_sort	Noh, Kyoungjin
collection	PubMed
description	In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments.
format	Online Article Text
id	pubmed-7180550
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-71805502020-05-01 Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments Noh, Kyoungjin Chang, Joon-Hyuk Sensors (Basel) Article In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments. MDPI 2020-03-28 /pmc/articles/PMC7180550/ /pubmed/32231161 http://dx.doi.org/10.3390/s20071883 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Noh, Kyoungjin Chang, Joon-Hyuk Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_full	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_fullStr	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_full_unstemmed	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_short	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_sort	joint optimization of deep neural network-based dereverberation and beamforming for sound event detection in multi-channel environments
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180550/ https://www.ncbi.nlm.nih.gov/pubmed/32231161 http://dx.doi.org/10.3390/s20071883
work_keys_str_mv	AT nohkyoungjin jointoptimizationofdeepneuralnetworkbaseddereverberationandbeamformingforsoundeventdetectioninmultichannelenvironments AT changjoonhyuk jointoptimizationofdeepneuralnetworkbaseddereverberationandbeamformingforsoundeventdetectioninmultichannelenvironments

Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

Ejemplares similares