Cargando…

Can You Tell Me where Wally Is?

Referring expression generation can be thought of as the converse problem to visual search: given a scene and a target, the participant's task is to generate a description which would allow somebody else to quickly and accurately locate the target. While this problem has been studied in psychol...

Descripción completa

Detalles Bibliográficos
Autores principales: Clarke, A D F, Elsner, M, Rohde, H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393640/
http://dx.doi.org/10.1068/ig6
_version_ 1783229590326476800
author Clarke, A D F
Elsner, M
Rohde, H
author_facet Clarke, A D F
Elsner, M
Rohde, H
author_sort Clarke, A D F
collection PubMed
description Referring expression generation can be thought of as the converse problem to visual search: given a scene and a target, the participant's task is to generate a description which would allow somebody else to quickly and accurately locate the target. While this problem has been studied in psycholinguistics and natural language processing, we believe that vision science also has a role to play. In particular, previous work on this problem is based on simple scenes consisting of a small number of objects and treats vision almost as a pre-process that extracts feature categories for each object in the scene. However, it is unlikely these models will scale: we know from the visual search literature that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. We hypothesize speakers will be sensitive to visual features allowing them to compose such ‘good’ descriptions. In the present study, we investigate how visual properties (salience, clutter, area and distance) influence REG using images from the “Where's Wally?” books [Handford 1987], which are an order of magnitude more complex than the stimuli traditionally used in REG experiments. We find that referring expressions for large salient targets are shorter than those for smaller and less salient targets. and that targets within highly cluttered scenes are described using more words. The choice of spatial relations also appears to be influenced by visual properties as participants show a preference for referencing large, salient landmarks that are in close proximity to the target.
format Online
Article
Text
id pubmed-5393640
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-53936402017-04-24 Can You Tell Me where Wally Is? Clarke, A D F Elsner, M Rohde, H Iperception Article Referring expression generation can be thought of as the converse problem to visual search: given a scene and a target, the participant's task is to generate a description which would allow somebody else to quickly and accurately locate the target. While this problem has been studied in psycholinguistics and natural language processing, we believe that vision science also has a role to play. In particular, previous work on this problem is based on simple scenes consisting of a small number of objects and treats vision almost as a pre-process that extracts feature categories for each object in the scene. However, it is unlikely these models will scale: we know from the visual search literature that some descriptions are better than others at enabling listeners to search efficiently within complex stimuli. We hypothesize speakers will be sensitive to visual features allowing them to compose such ‘good’ descriptions. In the present study, we investigate how visual properties (salience, clutter, area and distance) influence REG using images from the “Where's Wally?” books [Handford 1987], which are an order of magnitude more complex than the stimuli traditionally used in REG experiments. We find that referring expressions for large salient targets are shorter than those for smaller and less salient targets. and that targets within highly cluttered scenes are described using more words. The choice of spatial relations also appears to be influenced by visual properties as participants show a preference for referencing large, salient landmarks that are in close proximity to the target. SAGE Publications 2013-10-01 2013-10 /pmc/articles/PMC5393640/ http://dx.doi.org/10.1068/ig6 Text en © 2013 SAGE Publications Ltd. Manuscript content on this site is licensed under Creative Commons Licenses http://creativecommons.org/licenses/by/3.0/ This article is distributed under the terms of the Creative Commons Attribution 3.0 License (http://www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (http://www.uk.sagepub.com/aboutus/openaccess.htm).
spellingShingle Article
Clarke, A D F
Elsner, M
Rohde, H
Can You Tell Me where Wally Is?
title Can You Tell Me where Wally Is?
title_full Can You Tell Me where Wally Is?
title_fullStr Can You Tell Me where Wally Is?
title_full_unstemmed Can You Tell Me where Wally Is?
title_short Can You Tell Me where Wally Is?
title_sort can you tell me where wally is?
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393640/
http://dx.doi.org/10.1068/ig6
work_keys_str_mv AT clarkeadf canyoutellmewherewallyis
AT elsnerm canyoutellmewherewallyis
AT rohdeh canyoutellmewherewallyis