Cargando…

Visual speech discrimination and identification of natural and synthetic consonant stimuli

From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes),...

Descripción completa

Detalles Bibliográficos
Autores principales:	Files, Benjamin T., Tjan, Bosco S., Jiang, Jintao, Bernstein, Lynne E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2015
Materias:	Psychology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499841/ https://www.ncbi.nlm.nih.gov/pubmed/26217249 http://dx.doi.org/10.3389/fpsyg.2015.00878

_version_	1782380844164317184
author	Files, Benjamin T. Tjan, Bosco S. Jiang, Jintao Bernstein, Lynne E.
author_facet	Files, Benjamin T. Tjan, Bosco S. Jiang, Jintao Bernstein, Lynne E.
author_sort	Files, Benjamin T.
collection	PubMed
description	From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d’) and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d’) increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d’ with inverted stimuli but a persistent pattern of larger d’ for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.
format	Online Article Text
id	pubmed-4499841
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-44998412015-07-27 Visual speech discrimination and identification of natural and synthetic consonant stimuli Files, Benjamin T. Tjan, Bosco S. Jiang, Jintao Bernstein, Lynne E. Front Psychol Psychology From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d’) and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d’) increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d’ with inverted stimuli but a persistent pattern of larger d’ for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis. Frontiers Media S.A. 2015-07-13 /pmc/articles/PMC4499841/ /pubmed/26217249 http://dx.doi.org/10.3389/fpsyg.2015.00878 Text en Copyright © 2015 Files, Tjan, Jiang and Bernstein. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Psychology Files, Benjamin T. Tjan, Bosco S. Jiang, Jintao Bernstein, Lynne E. Visual speech discrimination and identification of natural and synthetic consonant stimuli
title	Visual speech discrimination and identification of natural and synthetic consonant stimuli
title_full	Visual speech discrimination and identification of natural and synthetic consonant stimuli
title_fullStr	Visual speech discrimination and identification of natural and synthetic consonant stimuli
title_full_unstemmed	Visual speech discrimination and identification of natural and synthetic consonant stimuli
title_short	Visual speech discrimination and identification of natural and synthetic consonant stimuli
title_sort	visual speech discrimination and identification of natural and synthetic consonant stimuli
topic	Psychology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499841/ https://www.ncbi.nlm.nih.gov/pubmed/26217249 http://dx.doi.org/10.3389/fpsyg.2015.00878
work_keys_str_mv	AT filesbenjamint visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli AT tjanboscos visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli AT jiangjintao visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli AT bernsteinlynnee visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli

Visual speech discrimination and identification of natural and synthetic consonant stimuli

Ejemplares similares