Cargando…

Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures

Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabil...

Descripción completa

Detalles Bibliográficos
Autores principales: Putta, Venkata Subbaiah, Selwin Mich Priyadharson, A., Sundramurthy, Venkatesa Prabhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436543/
https://www.ncbi.nlm.nih.gov/pubmed/36059405
http://dx.doi.org/10.1155/2022/4473952
_version_ 1784781389443366912
author Putta, Venkata Subbaiah
Selwin Mich Priyadharson, A.
Sundramurthy, Venkatesa Prabhu
author_facet Putta, Venkata Subbaiah
Selwin Mich Priyadharson, A.
Sundramurthy, Venkatesa Prabhu
author_sort Putta, Venkata Subbaiah
collection PubMed
description Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility.
format Online
Article
Text
id pubmed-9436543
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-94365432022-09-02 Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu Comput Intell Neurosci Research Article Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility. Hindawi 2022-08-25 /pmc/articles/PMC9436543/ /pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952 Text en Copyright © 2022 Venkata Subbaiah Putta et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Putta, Venkata Subbaiah
Selwin Mich Priyadharson, A.
Sundramurthy, Venkatesa Prabhu
Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_full Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_fullStr Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_full_unstemmed Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_short Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
title_sort regional language speech recognition from bone-conducted speech signals through different deep learning architectures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436543/
https://www.ncbi.nlm.nih.gov/pubmed/36059405
http://dx.doi.org/10.1155/2022/4473952
work_keys_str_mv AT puttavenkatasubbaiah regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures
AT selwinmichpriyadharsona regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures
AT sundramurthyvenkatesaprabhu regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures