Cargando…

Linguistic issues behind visual question answering

Answering a question that is grounded in an image is a crucial ability that requires understanding the question, the visual context, and their interaction at many linguistic levels: among others, semantics, syntax and pragmatics. As such, visually‐grounded questions have long been of interest to the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bernardi, Raffaella, Pezzelle, Sandro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2021
Materias:	Computational & Mathematical
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8244069/ https://www.ncbi.nlm.nih.gov/pubmed/34221111 http://dx.doi.org/10.1111/lnc3.12417

_version_	1783715859481493504
author	Bernardi, Raffaella Pezzelle, Sandro
author_facet	Bernardi, Raffaella Pezzelle, Sandro
author_sort	Bernardi, Raffaella
collection	PubMed
description	Answering a question that is grounded in an image is a crucial ability that requires understanding the question, the visual context, and their interaction at many linguistic levels: among others, semantics, syntax and pragmatics. As such, visually‐grounded questions have long been of interest to theoretical linguists and cognitive scientists. Moreover, they have inspired the first attempts to computationally model natural language understanding, where pioneering systems were faced with the highly challenging task—still unsolved—of jointly dealing with syntax, semantics and inference whilst understanding a visual context. Boosted by impressive advancements in machine learning, the task of answering visually‐grounded questions has experienced a renewed interest in recent years, to the point of becoming a research sub‐field at the intersection of computational linguistics and computer vision. In this paper, we review current approaches to the problem which encompass the development of datasets, models and frameworks. We conduct our investigation from the perspective of the theoretical linguists; we extract from pioneering computational linguistic work a list of desiderata that we use to review current computational achievements. We acknowledge that impressive progress has been made to reconcile the engineering with the theoretical view. At the same time, we claim that further research is needed to get to a unified approach which jointly encompasses all the underlying linguistic problems. We conclude the paper by sharing our own desiderata for the future.
format	Online Article Text
id	pubmed-8244069
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-82440692021-07-02 Linguistic issues behind visual question answering Bernardi, Raffaella Pezzelle, Sandro Lang Linguist Compass Computational & Mathematical Answering a question that is grounded in an image is a crucial ability that requires understanding the question, the visual context, and their interaction at many linguistic levels: among others, semantics, syntax and pragmatics. As such, visually‐grounded questions have long been of interest to theoretical linguists and cognitive scientists. Moreover, they have inspired the first attempts to computationally model natural language understanding, where pioneering systems were faced with the highly challenging task—still unsolved—of jointly dealing with syntax, semantics and inference whilst understanding a visual context. Boosted by impressive advancements in machine learning, the task of answering visually‐grounded questions has experienced a renewed interest in recent years, to the point of becoming a research sub‐field at the intersection of computational linguistics and computer vision. In this paper, we review current approaches to the problem which encompass the development of datasets, models and frameworks. We conduct our investigation from the perspective of the theoretical linguists; we extract from pioneering computational linguistic work a list of desiderata that we use to review current computational achievements. We acknowledge that impressive progress has been made to reconcile the engineering with the theoretical view. At the same time, we claim that further research is needed to get to a unified approach which jointly encompasses all the underlying linguistic problems. We conclude the paper by sharing our own desiderata for the future. John Wiley and Sons Inc. 2021-06-04 2021-06 /pmc/articles/PMC8244069/ /pubmed/34221111 http://dx.doi.org/10.1111/lnc3.12417 Text en © 2021 The Authors. Language and Linguistics Compass published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Computational & Mathematical Bernardi, Raffaella Pezzelle, Sandro Linguistic issues behind visual question answering
title	Linguistic issues behind visual question answering
title_full	Linguistic issues behind visual question answering
title_fullStr	Linguistic issues behind visual question answering
title_full_unstemmed	Linguistic issues behind visual question answering
title_short	Linguistic issues behind visual question answering
title_sort	linguistic issues behind visual question answering
topic	Computational & Mathematical
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8244069/ https://www.ncbi.nlm.nih.gov/pubmed/34221111 http://dx.doi.org/10.1111/lnc3.12417
work_keys_str_mv	AT bernardiraffaella linguisticissuesbehindvisualquestionanswering AT pezzellesandro linguisticissuesbehindvisualquestionanswering

Linguistic issues behind visual question answering

Ejemplares similares