Cargando…

Rethinking symbolic and visual context in Referring Expression Generation

Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (REG), where the production of identifying descrip...

Descripción completa

Detalles Bibliográficos
Autores principales: Schüz, Simeon, Gatt, Albert, Zarrieß, Sina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10072327/
https://www.ncbi.nlm.nih.gov/pubmed/37026020
http://dx.doi.org/10.3389/frai.2023.1067125
_version_ 1785019357896638464
author Schüz, Simeon
Gatt, Albert
Zarrieß, Sina
author_facet Schüz, Simeon
Gatt, Albert
Zarrieß, Sina
author_sort Schüz, Simeon
collection PubMed
description Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (REG), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains through symbolic information about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research in visual REG has turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction between positive and negative semantic forces exerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.
format Online
Article
Text
id pubmed-10072327
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100723272023-04-05 Rethinking symbolic and visual context in Referring Expression Generation Schüz, Simeon Gatt, Albert Zarrieß, Sina Front Artif Intell Artificial Intelligence Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (REG), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains through symbolic information about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research in visual REG has turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction between positive and negative semantic forces exerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks. Frontiers Media S.A. 2023-03-21 /pmc/articles/PMC10072327/ /pubmed/37026020 http://dx.doi.org/10.3389/frai.2023.1067125 Text en Copyright © 2023 Schüz, Gatt and Zarrieß. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Schüz, Simeon
Gatt, Albert
Zarrieß, Sina
Rethinking symbolic and visual context in Referring Expression Generation
title Rethinking symbolic and visual context in Referring Expression Generation
title_full Rethinking symbolic and visual context in Referring Expression Generation
title_fullStr Rethinking symbolic and visual context in Referring Expression Generation
title_full_unstemmed Rethinking symbolic and visual context in Referring Expression Generation
title_short Rethinking symbolic and visual context in Referring Expression Generation
title_sort rethinking symbolic and visual context in referring expression generation
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10072327/
https://www.ncbi.nlm.nih.gov/pubmed/37026020
http://dx.doi.org/10.3389/frai.2023.1067125
work_keys_str_mv AT schuzsimeon rethinkingsymbolicandvisualcontextinreferringexpressiongeneration
AT gattalbert rethinkingsymbolicandvisualcontextinreferringexpressiongeneration
AT zarrießsina rethinkingsymbolicandvisualcontextinreferringexpressiongeneration