Cargando…
Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans
Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8766421/ https://www.ncbi.nlm.nih.gov/pubmed/35069100 http://dx.doi.org/10.3389/fnins.2021.781196 |
_version_ | 1784634526809456640 |
---|---|
author | Varano, Enrico Vougioukas, Konstantinos Ma, Pingchuan Petridis, Stavros Pantic, Maja Reichenbach, Tobias |
author_facet | Varano, Enrico Vougioukas, Konstantinos Ma, Pingchuan Petridis, Stavros Pantic, Maja Reichenbach, Tobias |
author_sort | Varano, Enrico |
collection | PubMed |
description | Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments. |
format | Online Article Text |
id | pubmed-8766421 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-87664212022-01-20 Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans Varano, Enrico Vougioukas, Konstantinos Ma, Pingchuan Petridis, Stavros Pantic, Maja Reichenbach, Tobias Front Neurosci Neuroscience Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments. Frontiers Media S.A. 2022-01-05 /pmc/articles/PMC8766421/ /pubmed/35069100 http://dx.doi.org/10.3389/fnins.2021.781196 Text en Copyright © 2022 Varano, Vougioukas, Ma, Petridis, Pantic and Reichenbach. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Varano, Enrico Vougioukas, Konstantinos Ma, Pingchuan Petridis, Stavros Pantic, Maja Reichenbach, Tobias Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans |
title | Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans |
title_full | Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans |
title_fullStr | Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans |
title_full_unstemmed | Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans |
title_short | Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans |
title_sort | speech-driven facial animations improve speech-in-noise comprehension of humans |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8766421/ https://www.ncbi.nlm.nih.gov/pubmed/35069100 http://dx.doi.org/10.3389/fnins.2021.781196 |
work_keys_str_mv | AT varanoenrico speechdrivenfacialanimationsimprovespeechinnoisecomprehensionofhumans AT vougioukaskonstantinos speechdrivenfacialanimationsimprovespeechinnoisecomprehensionofhumans AT mapingchuan speechdrivenfacialanimationsimprovespeechinnoisecomprehensionofhumans AT petridisstavros speechdrivenfacialanimationsimprovespeechinnoisecomprehensionofhumans AT panticmaja speechdrivenfacialanimationsimprovespeechinnoisecomprehensionofhumans AT reichenbachtobias speechdrivenfacialanimationsimprovespeechinnoisecomprehensionofhumans |