Cargando…

Visual Relationship Detection with Multimodal Fusion and Reasoning

Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and i...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiao, Shouguan, Fu, Weiping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611296/
https://www.ncbi.nlm.nih.gov/pubmed/36298269
http://dx.doi.org/10.3390/s22207918
_version_ 1784819490542845952
author Xiao, Shouguan
Fu, Weiping
author_facet Xiao, Shouguan
Fu, Weiping
author_sort Xiao, Shouguan
collection PubMed
description Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets.
format Online
Article
Text
id pubmed-9611296
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96112962022-10-28 Visual Relationship Detection with Multimodal Fusion and Reasoning Xiao, Shouguan Fu, Weiping Sensors (Basel) Article Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets. MDPI 2022-10-18 /pmc/articles/PMC9611296/ /pubmed/36298269 http://dx.doi.org/10.3390/s22207918 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xiao, Shouguan
Fu, Weiping
Visual Relationship Detection with Multimodal Fusion and Reasoning
title Visual Relationship Detection with Multimodal Fusion and Reasoning
title_full Visual Relationship Detection with Multimodal Fusion and Reasoning
title_fullStr Visual Relationship Detection with Multimodal Fusion and Reasoning
title_full_unstemmed Visual Relationship Detection with Multimodal Fusion and Reasoning
title_short Visual Relationship Detection with Multimodal Fusion and Reasoning
title_sort visual relationship detection with multimodal fusion and reasoning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9611296/
https://www.ncbi.nlm.nih.gov/pubmed/36298269
http://dx.doi.org/10.3390/s22207918
work_keys_str_mv AT xiaoshouguan visualrelationshipdetectionwithmultimodalfusionandreasoning
AT fuweiping visualrelationshipdetectionwithmultimodalfusionandreasoning