Cargando…
Visual Relationship Detection with Multimodal Fusion and Reasoning
Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and i...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611296/ https://www.ncbi.nlm.nih.gov/pubmed/36298269 http://dx.doi.org/10.3390/s22207918 |
_version_ | 1784819490542845952 |
---|---|
author | Xiao, Shouguan Fu, Weiping |
author_facet | Xiao, Shouguan Fu, Weiping |
author_sort | Xiao, Shouguan |
collection | PubMed |
description | Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets. |
format | Online Article Text |
id | pubmed-9611296 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96112962022-10-28 Visual Relationship Detection with Multimodal Fusion and Reasoning Xiao, Shouguan Fu, Weiping Sensors (Basel) Article Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets. MDPI 2022-10-18 /pmc/articles/PMC9611296/ /pubmed/36298269 http://dx.doi.org/10.3390/s22207918 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Xiao, Shouguan Fu, Weiping Visual Relationship Detection with Multimodal Fusion and Reasoning |
title | Visual Relationship Detection with Multimodal Fusion and Reasoning |
title_full | Visual Relationship Detection with Multimodal Fusion and Reasoning |
title_fullStr | Visual Relationship Detection with Multimodal Fusion and Reasoning |
title_full_unstemmed | Visual Relationship Detection with Multimodal Fusion and Reasoning |
title_short | Visual Relationship Detection with Multimodal Fusion and Reasoning |
title_sort | visual relationship detection with multimodal fusion and reasoning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611296/ https://www.ncbi.nlm.nih.gov/pubmed/36298269 http://dx.doi.org/10.3390/s22207918 |
work_keys_str_mv | AT xiaoshouguan visualrelationshipdetectionwithmultimodalfusionandreasoning AT fuweiping visualrelationshipdetectionwithmultimodalfusionandreasoning |