Cargando…
Causal inference of asynchronous audiovisual speech
During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speec...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3826594/ https://www.ncbi.nlm.nih.gov/pubmed/24294207 http://dx.doi.org/10.3389/fpsyg.2013.00798 |
_version_ | 1782290929906876416 |
---|---|
author | Magnotti, John F. Ma, Wei Ji Beauchamp, Michael S. |
author_facet | Magnotti, John F. Ma, Wei Ji Beauchamp, Michael S. |
author_sort | Magnotti, John F. |
collection | PubMed |
description | During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties. |
format | Online Article Text |
id | pubmed-3826594 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-38265942013-11-29 Causal inference of asynchronous audiovisual speech Magnotti, John F. Ma, Wei Ji Beauchamp, Michael S. Front Psychol Psychology During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties. Frontiers Media S.A. 2013-11-13 /pmc/articles/PMC3826594/ /pubmed/24294207 http://dx.doi.org/10.3389/fpsyg.2013.00798 Text en Copyright © 2013 Magnotti, Ma and Beauchamp. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Psychology Magnotti, John F. Ma, Wei Ji Beauchamp, Michael S. Causal inference of asynchronous audiovisual speech |
title | Causal inference of asynchronous audiovisual speech |
title_full | Causal inference of asynchronous audiovisual speech |
title_fullStr | Causal inference of asynchronous audiovisual speech |
title_full_unstemmed | Causal inference of asynchronous audiovisual speech |
title_short | Causal inference of asynchronous audiovisual speech |
title_sort | causal inference of asynchronous audiovisual speech |
topic | Psychology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3826594/ https://www.ncbi.nlm.nih.gov/pubmed/24294207 http://dx.doi.org/10.3389/fpsyg.2013.00798 |
work_keys_str_mv | AT magnottijohnf causalinferenceofasynchronousaudiovisualspeech AT maweiji causalinferenceofasynchronousaudiovisualspeech AT beauchampmichaels causalinferenceofasynchronousaudiovisualspeech |