Cargando…

Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck

Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy o...

Descripción completa

Detalles Bibliográficos
Autores principales: Vojtech, Jennifer M., Mitchell, Claire L., Raiff, Laura, Kline, Joshua C., De Luca, Gianluca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9592063/
https://www.ncbi.nlm.nih.gov/pubmed/36299552
http://dx.doi.org/10.3390/vibration5040041
_version_ 1784814839783227392
author Vojtech, Jennifer M.
Mitchell, Claire L.
Raiff, Laura
Kline, Joshua C.
De Luca, Gianluca
author_facet Vojtech, Jennifer M.
Mitchell, Claire L.
Raiff, Laura
Kline, Joshua C.
De Luca, Gianluca
author_sort Vojtech, Jennifer M.
collection PubMed
description Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development.
format Online
Article
Text
id pubmed-9592063
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-95920632022-12-01 Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck Vojtech, Jennifer M. Mitchell, Claire L. Raiff, Laura Kline, Joshua C. De Luca, Gianluca Vibration Article Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development. 2022-12 2022-10-13 /pmc/articles/PMC9592063/ /pubmed/36299552 http://dx.doi.org/10.3390/vibration5040041 Text en https://creativecommons.org/licenses/by/4.0/This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vojtech, Jennifer M.
Mitchell, Claire L.
Raiff, Laura
Kline, Joshua C.
De Luca, Gianluca
Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
title Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
title_full Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
title_fullStr Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
title_full_unstemmed Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
title_short Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
title_sort prediction of voice fundamental frequency and intensity from surface electromyographic signals of the face and neck
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9592063/
https://www.ncbi.nlm.nih.gov/pubmed/36299552
http://dx.doi.org/10.3390/vibration5040041
work_keys_str_mv AT vojtechjenniferm predictionofvoicefundamentalfrequencyandintensityfromsurfaceelectromyographicsignalsofthefaceandneck
AT mitchellclairel predictionofvoicefundamentalfrequencyandintensityfromsurfaceelectromyographicsignalsofthefaceandneck
AT raifflaura predictionofvoicefundamentalfrequencyandintensityfromsurfaceelectromyographicsignalsofthefaceandneck
AT klinejoshuac predictionofvoicefundamentalfrequencyandintensityfromsurfaceelectromyographicsignalsofthefaceandneck
AT delucagianluca predictionofvoicefundamentalfrequencyandintensityfromsurfaceelectromyographicsignalsofthefaceandneck