Cargando…

BPI-MVQA: a bi-branch model for medical visual question answering

BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Shengyan, Zhang, Xuejie, Zhou, Xiaobing, Yang, Jian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052498/ https://www.ncbi.nlm.nih.gov/pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x

_version_	1784696796132409344
author	Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian
author_facet	Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian
author_sort	Liu, Shengyan
collection	PubMed
description	BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text] , 1.4[Formula: see text] , and 1.1[Formula: see text] . CONCLUSION: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system.
format	Online Article Text
id	pubmed-9052498
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-90524982022-04-30 BPI-MVQA: a bi-branch model for medical visual question answering Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian BMC Med Imaging Research BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text] , 1.4[Formula: see text] , and 1.1[Formula: see text] . CONCLUSION: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system. BioMed Central 2022-04-29 /pmc/articles/PMC9052498/ /pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian BPI-MVQA: a bi-branch model for medical visual question answering
title	BPI-MVQA: a bi-branch model for medical visual question answering
title_full	BPI-MVQA: a bi-branch model for medical visual question answering
title_fullStr	BPI-MVQA: a bi-branch model for medical visual question answering
title_full_unstemmed	BPI-MVQA: a bi-branch model for medical visual question answering
title_short	BPI-MVQA: a bi-branch model for medical visual question answering
title_sort	bpi-mvqa: a bi-branch model for medical visual question answering
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052498/ https://www.ncbi.nlm.nih.gov/pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x
work_keys_str_mv	AT liushengyan bpimvqaabibranchmodelformedicalvisualquestionanswering AT zhangxuejie bpimvqaabibranchmodelformedicalvisualquestionanswering AT zhouxiaobing bpimvqaabibranchmodelformedicalvisualquestionanswering AT yangjian bpimvqaabibranchmodelformedicalvisualquestionanswering

BPI-MVQA: a bi-branch model for medical visual question answering

Ejemplares similares