Cargando…

Cross-Modal Object Detection Based on a Knowledge Update

As an important field of computer vision, object detection has been studied extensively in recent years. However, existing object detection methods merely utilize the visual information of the image and fail to mine the high-level semantic information of the object, which leads to great limitations....

Descripción completa

Detalles Bibliográficos
Autores principales:	Gao, Yueqing, Zhou, Huachun, Chen, Lulu, Shen, Yuting, Guo, Ce, Zhang, Xinyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963053/ https://www.ncbi.nlm.nih.gov/pubmed/35214240 http://dx.doi.org/10.3390/s22041338

_version_	1784677910375825408
author	Gao, Yueqing Zhou, Huachun Chen, Lulu Shen, Yuting Guo, Ce Zhang, Xinyu
author_facet	Gao, Yueqing Zhou, Huachun Chen, Lulu Shen, Yuting Guo, Ce Zhang, Xinyu
author_sort	Gao, Yueqing
collection	PubMed
description	As an important field of computer vision, object detection has been studied extensively in recent years. However, existing object detection methods merely utilize the visual information of the image and fail to mine the high-level semantic information of the object, which leads to great limitations. To take full advantage of multi-source information, a knowledge update-based multimodal object recognition model is proposed in this paper. Specifically, our method initially uses Faster R-CNN to regionalize the image, then applies a transformer-based multimodal encoder to encode visual region features (region-based image features) and textual features (semantic relationships between words) corresponding to pictures. After that, a graph convolutional network (GCN) inference module is introduced to establish a relational network in which the points denote visual and textual region features, and the edges represent their relationships. In addition, based on an external knowledge base, our method further enhances the region-based relationship expression capability through a knowledge update module. In summary, the proposed algorithm not only learns the accurate relationship between objects in different regions of the image, but also benefits from the knowledge update through an external relational database. Experimental results verify the effectiveness of the proposed knowledge update module and the independent reasoning ability of our model.
format	Online Article Text
id	pubmed-8963053
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-89630532022-03-30 Cross-Modal Object Detection Based on a Knowledge Update Gao, Yueqing Zhou, Huachun Chen, Lulu Shen, Yuting Guo, Ce Zhang, Xinyu Sensors (Basel) Article As an important field of computer vision, object detection has been studied extensively in recent years. However, existing object detection methods merely utilize the visual information of the image and fail to mine the high-level semantic information of the object, which leads to great limitations. To take full advantage of multi-source information, a knowledge update-based multimodal object recognition model is proposed in this paper. Specifically, our method initially uses Faster R-CNN to regionalize the image, then applies a transformer-based multimodal encoder to encode visual region features (region-based image features) and textual features (semantic relationships between words) corresponding to pictures. After that, a graph convolutional network (GCN) inference module is introduced to establish a relational network in which the points denote visual and textual region features, and the edges represent their relationships. In addition, based on an external knowledge base, our method further enhances the region-based relationship expression capability through a knowledge update module. In summary, the proposed algorithm not only learns the accurate relationship between objects in different regions of the image, but also benefits from the knowledge update through an external relational database. Experimental results verify the effectiveness of the proposed knowledge update module and the independent reasoning ability of our model. MDPI 2022-02-10 /pmc/articles/PMC8963053/ /pubmed/35214240 http://dx.doi.org/10.3390/s22041338 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Gao, Yueqing Zhou, Huachun Chen, Lulu Shen, Yuting Guo, Ce Zhang, Xinyu Cross-Modal Object Detection Based on a Knowledge Update
title	Cross-Modal Object Detection Based on a Knowledge Update
title_full	Cross-Modal Object Detection Based on a Knowledge Update
title_fullStr	Cross-Modal Object Detection Based on a Knowledge Update
title_full_unstemmed	Cross-Modal Object Detection Based on a Knowledge Update
title_short	Cross-Modal Object Detection Based on a Knowledge Update
title_sort	cross-modal object detection based on a knowledge update
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963053/ https://www.ncbi.nlm.nih.gov/pubmed/35214240 http://dx.doi.org/10.3390/s22041338
work_keys_str_mv	AT gaoyueqing crossmodalobjectdetectionbasedonaknowledgeupdate AT zhouhuachun crossmodalobjectdetectionbasedonaknowledgeupdate AT chenlulu crossmodalobjectdetectionbasedonaknowledgeupdate AT shenyuting crossmodalobjectdetectionbasedonaknowledgeupdate AT guoce crossmodalobjectdetectionbasedonaknowledgeupdate AT zhangxinyu crossmodalobjectdetectionbasedonaknowledgeupdate

Cross-Modal Object Detection Based on a Knowledge Update

Ejemplares similares