Cargando…

Modulation transfer functions for audiovisual speech

Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope informatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pedersen, Nicolai F., Dau, Torsten, Hansen, Lars Kai, Hjortkjær, Jens
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9295967/ https://www.ncbi.nlm.nih.gov/pubmed/35852989 http://dx.doi.org/10.1371/journal.pcbi.1010273

_version_	1784750165579530240
author	Pedersen, Nicolai F. Dau, Torsten Hansen, Lars Kai Hjortkjær, Jens
author_facet	Pedersen, Nicolai F. Dau, Torsten Hansen, Lars Kai Hjortkjær, Jens
author_sort	Pedersen, Nicolai F.
collection	PubMed
description	Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech.
format	Online Article Text
id	pubmed-9295967
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-92959672022-07-20 Modulation transfer functions for audiovisual speech Pedersen, Nicolai F. Dau, Torsten Hansen, Lars Kai Hjortkjær, Jens PLoS Comput Biol Research Article Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech. Public Library of Science 2022-07-19 /pmc/articles/PMC9295967/ /pubmed/35852989 http://dx.doi.org/10.1371/journal.pcbi.1010273 Text en © 2022 Pedersen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Pedersen, Nicolai F. Dau, Torsten Hansen, Lars Kai Hjortkjær, Jens Modulation transfer functions for audiovisual speech
title	Modulation transfer functions for audiovisual speech
title_full	Modulation transfer functions for audiovisual speech
title_fullStr	Modulation transfer functions for audiovisual speech
title_full_unstemmed	Modulation transfer functions for audiovisual speech
title_short	Modulation transfer functions for audiovisual speech
title_sort	modulation transfer functions for audiovisual speech
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9295967/ https://www.ncbi.nlm.nih.gov/pubmed/35852989 http://dx.doi.org/10.1371/journal.pcbi.1010273
work_keys_str_mv	AT pedersennicolaif modulationtransferfunctionsforaudiovisualspeech AT dautorsten modulationtransferfunctionsforaudiovisualspeech AT hansenlarskai modulationtransferfunctionsforaudiovisualspeech AT hjortkjærjens modulationtransferfunctionsforaudiovisualspeech

Modulation transfer functions for audiovisual speech

Ejemplares similares