Cargando…

The interrelationship between the face and vocal tract configuration during audiovisual speech

It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Scholes, Chris, Skipper, Jeremy I., Johnston, Alan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768679/
https://www.ncbi.nlm.nih.gov/pubmed/33293422
http://dx.doi.org/10.1073/pnas.2006192117
_version_ 1783629204022099968
author Scholes, Chris
Skipper, Jeremy I.
Johnston, Alan
author_facet Scholes, Chris
Skipper, Jeremy I.
Johnston, Alan
author_sort Scholes, Chris
collection PubMed
description It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues are combined to improve speech perception. One suggestion is that integration can occur as both visual and auditory cues arise from a common generator: the vocal tract. Here, we investigate whether facial and vocal tract movements are linked during speech production by comparing videos of the face and fast magnetic resonance (MR) image sequences of the vocal tract. The joint variation in the face and vocal tract was extracted using an application of principal components analysis (PCA), and we demonstrate that MR image sequences can be reconstructed with high fidelity using only the facial video and PCA. Reconstruction fidelity was significantly higher when images from the two sequences corresponded in time, and including implicit temporal information by combining contiguous frames also led to a significant increase in fidelity. A “Bubbles” technique was used to identify which areas of the face were important for recovering information about the vocal tract, and vice versa, on a frame-by-frame basis. Our data reveal that there is sufficient information in the face to recover vocal tract shape during speech. In addition, the facial and vocal tract regions that are important for reconstruction are those that are used to generate the acoustic speech signal.
format Online
Article
Text
id pubmed-7768679
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-77686792021-01-11 The interrelationship between the face and vocal tract configuration during audiovisual speech Scholes, Chris Skipper, Jeremy I. Johnston, Alan Proc Natl Acad Sci U S A Biological Sciences It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues are combined to improve speech perception. One suggestion is that integration can occur as both visual and auditory cues arise from a common generator: the vocal tract. Here, we investigate whether facial and vocal tract movements are linked during speech production by comparing videos of the face and fast magnetic resonance (MR) image sequences of the vocal tract. The joint variation in the face and vocal tract was extracted using an application of principal components analysis (PCA), and we demonstrate that MR image sequences can be reconstructed with high fidelity using only the facial video and PCA. Reconstruction fidelity was significantly higher when images from the two sequences corresponded in time, and including implicit temporal information by combining contiguous frames also led to a significant increase in fidelity. A “Bubbles” technique was used to identify which areas of the face were important for recovering information about the vocal tract, and vice versa, on a frame-by-frame basis. Our data reveal that there is sufficient information in the face to recover vocal tract shape during speech. In addition, the facial and vocal tract regions that are important for reconstruction are those that are used to generate the acoustic speech signal. National Academy of Sciences 2020-12-22 2020-12-08 /pmc/articles/PMC7768679/ /pubmed/33293422 http://dx.doi.org/10.1073/pnas.2006192117 Text en Copyright © 2020 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle Biological Sciences
Scholes, Chris
Skipper, Jeremy I.
Johnston, Alan
The interrelationship between the face and vocal tract configuration during audiovisual speech
title The interrelationship between the face and vocal tract configuration during audiovisual speech
title_full The interrelationship between the face and vocal tract configuration during audiovisual speech
title_fullStr The interrelationship between the face and vocal tract configuration during audiovisual speech
title_full_unstemmed The interrelationship between the face and vocal tract configuration during audiovisual speech
title_short The interrelationship between the face and vocal tract configuration during audiovisual speech
title_sort interrelationship between the face and vocal tract configuration during audiovisual speech
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768679/
https://www.ncbi.nlm.nih.gov/pubmed/33293422
http://dx.doi.org/10.1073/pnas.2006192117
work_keys_str_mv AT scholeschris theinterrelationshipbetweenthefaceandvocaltractconfigurationduringaudiovisualspeech
AT skipperjeremyi theinterrelationshipbetweenthefaceandvocaltractconfigurationduringaudiovisualspeech
AT johnstonalan theinterrelationshipbetweenthefaceandvocaltractconfigurationduringaudiovisualspeech
AT scholeschris interrelationshipbetweenthefaceandvocaltractconfigurationduringaudiovisualspeech
AT skipperjeremyi interrelationshipbetweenthefaceandvocaltractconfigurationduringaudiovisualspeech
AT johnstonalan interrelationshipbetweenthefaceandvocaltractconfigurationduringaudiovisualspeech