Cargando…

Speech Driven Gaze in a Face-to-Face Interaction

Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that conveys crucial social signals in face-to-face conversation. However, compared to language, gaze has been less studied as a communication modality. The purpose of the present study is 2-fold: (i) to...

Descripción completa

Detalles Bibliográficos
Autores principales: Arslan Aydin, Ülkü, Kalkan, Sinan, Acartürk, Cengiz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7970197/
https://www.ncbi.nlm.nih.gov/pubmed/33746729
http://dx.doi.org/10.3389/fnbot.2021.598895
_version_ 1783666388726972416
author Arslan Aydin, Ülkü
Kalkan, Sinan
Acartürk, Cengiz
author_facet Arslan Aydin, Ülkü
Kalkan, Sinan
Acartürk, Cengiz
author_sort Arslan Aydin, Ülkü
collection PubMed
description Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that conveys crucial social signals in face-to-face conversation. However, compared to language, gaze has been less studied as a communication modality. The purpose of the present study is 2-fold: (i) to investigate gaze direction (i.e., aversion and face gaze) and its relation to speech in a face-to-face interaction; and (ii) to propose a computational model for multimodal communication, which predicts gaze direction using high-level speech features. Twenty-eight pairs of participants participated in data collection. The experimental setting was a mock job interview. The eye movements were recorded for both participants. The speech data were annotated by ISO 24617-2 Standard for Dialogue Act Annotation, as well as manual tags based on previous social gaze studies. A comparative analysis was conducted by Convolutional Neural Network (CNN) models that employed specific architectures, namely, VGGNet and ResNet. The results showed that the frequency and the duration of gaze differ significantly depending on the role of participant. Moreover, the ResNet models achieve higher than 70% accuracy in predicting gaze direction.
format Online
Article
Text
id pubmed-7970197
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79701972021-03-19 Speech Driven Gaze in a Face-to-Face Interaction Arslan Aydin, Ülkü Kalkan, Sinan Acartürk, Cengiz Front Neurorobot Neuroscience Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that conveys crucial social signals in face-to-face conversation. However, compared to language, gaze has been less studied as a communication modality. The purpose of the present study is 2-fold: (i) to investigate gaze direction (i.e., aversion and face gaze) and its relation to speech in a face-to-face interaction; and (ii) to propose a computational model for multimodal communication, which predicts gaze direction using high-level speech features. Twenty-eight pairs of participants participated in data collection. The experimental setting was a mock job interview. The eye movements were recorded for both participants. The speech data were annotated by ISO 24617-2 Standard for Dialogue Act Annotation, as well as manual tags based on previous social gaze studies. A comparative analysis was conducted by Convolutional Neural Network (CNN) models that employed specific architectures, namely, VGGNet and ResNet. The results showed that the frequency and the duration of gaze differ significantly depending on the role of participant. Moreover, the ResNet models achieve higher than 70% accuracy in predicting gaze direction. Frontiers Media S.A. 2021-03-04 /pmc/articles/PMC7970197/ /pubmed/33746729 http://dx.doi.org/10.3389/fnbot.2021.598895 Text en Copyright © 2021 Arslan Aydin, Kalkan and Acartürk. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Arslan Aydin, Ülkü
Kalkan, Sinan
Acartürk, Cengiz
Speech Driven Gaze in a Face-to-Face Interaction
title Speech Driven Gaze in a Face-to-Face Interaction
title_full Speech Driven Gaze in a Face-to-Face Interaction
title_fullStr Speech Driven Gaze in a Face-to-Face Interaction
title_full_unstemmed Speech Driven Gaze in a Face-to-Face Interaction
title_short Speech Driven Gaze in a Face-to-Face Interaction
title_sort speech driven gaze in a face-to-face interaction
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7970197/
https://www.ncbi.nlm.nih.gov/pubmed/33746729
http://dx.doi.org/10.3389/fnbot.2021.598895
work_keys_str_mv AT arslanaydinulku speechdrivengazeinafacetofaceinteraction
AT kalkansinan speechdrivengazeinafacetofaceinteraction
AT acarturkcengiz speechdrivengazeinafacetofaceinteraction