Cargando…

Detecting Lombard Speech Using Deep Learning Approach

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kąkol, Krzysztof, Korvel, Gražina, Tamulevičius, Gintautas, Kostek, Bożena
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824848/ https://www.ncbi.nlm.nih.gov/pubmed/36616913 http://dx.doi.org/10.3390/s23010315

_version_	1784866510989164544
author	Kąkol, Krzysztof Korvel, Gražina Tamulevičius, Gintautas Kostek, Bożena
author_facet	Kąkol, Krzysztof Korvel, Gražina Tamulevičius, Gintautas Kostek, Bożena
author_sort	Kąkol, Krzysztof
collection	PubMed
description	Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work.
format	Online Article Text
id	pubmed-9824848
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98248482023-01-08 Detecting Lombard Speech Using Deep Learning Approach Kąkol, Krzysztof Korvel, Gražina Tamulevičius, Gintautas Kostek, Bożena Sensors (Basel) Article Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work. MDPI 2022-12-28 /pmc/articles/PMC9824848/ /pubmed/36616913 http://dx.doi.org/10.3390/s23010315 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kąkol, Krzysztof Korvel, Gražina Tamulevičius, Gintautas Kostek, Bożena Detecting Lombard Speech Using Deep Learning Approach
title	Detecting Lombard Speech Using Deep Learning Approach
title_full	Detecting Lombard Speech Using Deep Learning Approach
title_fullStr	Detecting Lombard Speech Using Deep Learning Approach
title_full_unstemmed	Detecting Lombard Speech Using Deep Learning Approach
title_short	Detecting Lombard Speech Using Deep Learning Approach
title_sort	detecting lombard speech using deep learning approach
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824848/ https://www.ncbi.nlm.nih.gov/pubmed/36616913 http://dx.doi.org/10.3390/s23010315
work_keys_str_mv	AT kakolkrzysztof detectinglombardspeechusingdeeplearningapproach AT korvelgrazina detectinglombardspeechusingdeeplearningapproach AT tamuleviciusgintautas detectinglombardspeechusingdeeplearningapproach AT kostekbozena detectinglombardspeechusingdeeplearningapproach

Detecting Lombard Speech Using Deep Learning Approach

Ejemplares similares