Cargando…

CE-BART: Cause-and-Effect BART for Visual Commonsense Generation

“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what nee...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Junyeong, Hong, Ji Woo, Yoon, Sunjae, Yoo, Chang D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736342/
https://www.ncbi.nlm.nih.gov/pubmed/36502101
http://dx.doi.org/10.3390/s22239399
_version_ 1784847001795428352
author Kim, Junyeong
Hong, Ji Woo
Yoon, Sunjae
Yoo, Chang D.
author_facet Kim, Junyeong
Hong, Ji Woo
Yoon, Sunjae
Yoo, Chang D.
author_sort Kim, Junyeong
collection PubMed
description “A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision–language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability.
format Online
Article
Text
id pubmed-9736342
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97363422022-12-11 CE-BART: Cause-and-Effect BART for Visual Commonsense Generation Kim, Junyeong Hong, Ji Woo Yoon, Sunjae Yoo, Chang D. Sensors (Basel) Article “A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision–language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability. MDPI 2022-12-02 /pmc/articles/PMC9736342/ /pubmed/36502101 http://dx.doi.org/10.3390/s22239399 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Junyeong
Hong, Ji Woo
Yoon, Sunjae
Yoo, Chang D.
CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
title CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
title_full CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
title_fullStr CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
title_full_unstemmed CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
title_short CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
title_sort ce-bart: cause-and-effect bart for visual commonsense generation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736342/
https://www.ncbi.nlm.nih.gov/pubmed/36502101
http://dx.doi.org/10.3390/s22239399
work_keys_str_mv AT kimjunyeong cebartcauseandeffectbartforvisualcommonsensegeneration
AT hongjiwoo cebartcauseandeffectbartforvisualcommonsensegeneration
AT yoonsunjae cebartcauseandeffectbartforvisualcommonsensegeneration
AT yoochangd cebartcauseandeffectbartforvisualcommonsensegeneration