Cargando…

Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures

Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabil...

Descripción completa

Detalles Bibliográficos
Autores principales:	Putta, Venkata Subbaiah, Selwin Mich Priyadharson, A., Sundramurthy, Venkatesa Prabhu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436543/ https://www.ncbi.nlm.nih.gov/pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952

_version_	1784781389443366912
author	Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu
author_facet	Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu
author_sort	Putta, Venkata Subbaiah
collection	PubMed
description	Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility.
format	Online Article Text
id	pubmed-9436543
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-94365432022-09-02 Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu Comput Intell Neurosci Research Article Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility. Hindawi 2022-08-25 /pmc/articles/PMC9436543/ /pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952 Text en Copyright © 2022 Venkata Subbaiah Putta et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title	Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_full	Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_fullStr	Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_full_unstemmed	Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_short	Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_sort	regional language speech recognition from bone-conducted speech signals through different deep learning architectures
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436543/ https://www.ncbi.nlm.nih.gov/pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952
work_keys_str_mv	AT puttavenkatasubbaiah regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures AT selwinmichpriyadharsona regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures AT sundramurthyvenkatesaprabhu regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures

Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures

Ejemplares similares