Cargando…
CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what nee...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736342/ https://www.ncbi.nlm.nih.gov/pubmed/36502101 http://dx.doi.org/10.3390/s22239399 |
_version_ | 1784847001795428352 |
---|---|
author | Kim, Junyeong Hong, Ji Woo Yoon, Sunjae Yoo, Chang D. |
author_facet | Kim, Junyeong Hong, Ji Woo Yoon, Sunjae Yoo, Chang D. |
author_sort | Kim, Junyeong |
collection | PubMed |
description | “A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision–language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability. |
format | Online Article Text |
id | pubmed-9736342 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-97363422022-12-11 CE-BART: Cause-and-Effect BART for Visual Commonsense Generation Kim, Junyeong Hong, Ji Woo Yoon, Sunjae Yoo, Chang D. Sensors (Basel) Article “A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision–language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability. MDPI 2022-12-02 /pmc/articles/PMC9736342/ /pubmed/36502101 http://dx.doi.org/10.3390/s22239399 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kim, Junyeong Hong, Ji Woo Yoon, Sunjae Yoo, Chang D. CE-BART: Cause-and-Effect BART for Visual Commonsense Generation |
title | CE-BART: Cause-and-Effect BART for Visual Commonsense Generation |
title_full | CE-BART: Cause-and-Effect BART for Visual Commonsense Generation |
title_fullStr | CE-BART: Cause-and-Effect BART for Visual Commonsense Generation |
title_full_unstemmed | CE-BART: Cause-and-Effect BART for Visual Commonsense Generation |
title_short | CE-BART: Cause-and-Effect BART for Visual Commonsense Generation |
title_sort | ce-bart: cause-and-effect bart for visual commonsense generation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736342/ https://www.ncbi.nlm.nih.gov/pubmed/36502101 http://dx.doi.org/10.3390/s22239399 |
work_keys_str_mv | AT kimjunyeong cebartcauseandeffectbartforvisualcommonsensegeneration AT hongjiwoo cebartcauseandeffectbartforvisualcommonsensegeneration AT yoonsunjae cebartcauseandeffectbartforvisualcommonsensegeneration AT yoochangd cebartcauseandeffectbartforvisualcommonsensegeneration |