Cargando…

Plain-to-clear speech video conversion for enhanced intelligibility

Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of Englis...

Descripción completa

Detalles Bibliográficos
Autores principales: Sachdeva, Shubam, Ruan, Haoyao, Hamarneh, Ghassan, Behne, Dawn M., Jongman, Allard, Sereno, Joan A., Wang, Yue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10042924/
https://www.ncbi.nlm.nih.gov/pubmed/37008883
http://dx.doi.org/10.1007/s10772-023-10018-z
_version_ 1784913037115785216
author Sachdeva, Shubam
Ruan, Haoyao
Hamarneh, Ghassan
Behne, Dawn M.
Jongman, Allard
Sereno, Joan A.
Wang, Yue
author_facet Sachdeva, Shubam
Ruan, Haoyao
Hamarneh, Ghassan
Behne, Dawn M.
Jongman, Allard
Sereno, Joan A.
Wang, Yue
author_sort Sachdeva, Shubam
collection PubMed
description Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies.
format Online
Article
Text
id pubmed-10042924
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-100429242023-03-29 Plain-to-clear speech video conversion for enhanced intelligibility Sachdeva, Shubam Ruan, Haoyao Hamarneh, Ghassan Behne, Dawn M. Jongman, Allard Sereno, Joan A. Wang, Yue Int J Speech Technol Article Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies. Springer US 2023-01-28 2023 /pmc/articles/PMC10042924/ /pubmed/37008883 http://dx.doi.org/10.1007/s10772-023-10018-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sachdeva, Shubam
Ruan, Haoyao
Hamarneh, Ghassan
Behne, Dawn M.
Jongman, Allard
Sereno, Joan A.
Wang, Yue
Plain-to-clear speech video conversion for enhanced intelligibility
title Plain-to-clear speech video conversion for enhanced intelligibility
title_full Plain-to-clear speech video conversion for enhanced intelligibility
title_fullStr Plain-to-clear speech video conversion for enhanced intelligibility
title_full_unstemmed Plain-to-clear speech video conversion for enhanced intelligibility
title_short Plain-to-clear speech video conversion for enhanced intelligibility
title_sort plain-to-clear speech video conversion for enhanced intelligibility
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10042924/
https://www.ncbi.nlm.nih.gov/pubmed/37008883
http://dx.doi.org/10.1007/s10772-023-10018-z
work_keys_str_mv AT sachdevashubam plaintoclearspeechvideoconversionforenhancedintelligibility
AT ruanhaoyao plaintoclearspeechvideoconversionforenhancedintelligibility
AT hamarnehghassan plaintoclearspeechvideoconversionforenhancedintelligibility
AT behnedawnm plaintoclearspeechvideoconversionforenhancedintelligibility
AT jongmanallard plaintoclearspeechvideoconversionforenhancedintelligibility
AT serenojoana plaintoclearspeechvideoconversionforenhancedintelligibility
AT wangyue plaintoclearspeechvideoconversionforenhancedintelligibility