Cargando…
Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures
Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabil...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436543/ https://www.ncbi.nlm.nih.gov/pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952 |
_version_ | 1784781389443366912 |
---|---|
author | Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu |
author_facet | Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu |
author_sort | Putta, Venkata Subbaiah |
collection | PubMed |
description | Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility. |
format | Online Article Text |
id | pubmed-9436543 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-94365432022-09-02 Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu Comput Intell Neurosci Research Article Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility. Hindawi 2022-08-25 /pmc/articles/PMC9436543/ /pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952 Text en Copyright © 2022 Venkata Subbaiah Putta et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Putta, Venkata Subbaiah Selwin Mich Priyadharson, A. Sundramurthy, Venkatesa Prabhu Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures |
title | Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures |
title_full | Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures |
title_fullStr | Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures |
title_full_unstemmed | Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures |
title_short | Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures |
title_sort | regional language speech recognition from bone-conducted speech signals through different deep learning architectures |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436543/ https://www.ncbi.nlm.nih.gov/pubmed/36059405 http://dx.doi.org/10.1155/2022/4473952 |
work_keys_str_mv | AT puttavenkatasubbaiah regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures AT selwinmichpriyadharsona regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures AT sundramurthyvenkatesaprabhu regionallanguagespeechrecognitionfromboneconductedspeechsignalsthroughdifferentdeeplearningarchitectures |