Cargando…

BPI-MVQA: a bi-branch model for medical visual question answering

BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features o...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Shengyan, Zhang, Xuejie, Zhou, Xiaobing, Yang, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052498/
https://www.ncbi.nlm.nih.gov/pubmed/35488285
http://dx.doi.org/10.1186/s12880-022-00800-x
_version_ 1784696796132409344
author Liu, Shengyan
Zhang, Xuejie
Zhou, Xiaobing
Yang, Jian
author_facet Liu, Shengyan
Zhang, Xuejie
Zhou, Xiaobing
Yang, Jian
author_sort Liu, Shengyan
collection PubMed
description BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text] , 1.4[Formula: see text] , and 1.1[Formula: see text] . CONCLUSION: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system.
format Online
Article
Text
id pubmed-9052498
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90524982022-04-30 BPI-MVQA: a bi-branch model for medical visual question answering Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian BMC Med Imaging Research BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text] , 1.4[Formula: see text] , and 1.1[Formula: see text] . CONCLUSION: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system. BioMed Central 2022-04-29 /pmc/articles/PMC9052498/ /pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Shengyan
Zhang, Xuejie
Zhou, Xiaobing
Yang, Jian
BPI-MVQA: a bi-branch model for medical visual question answering
title BPI-MVQA: a bi-branch model for medical visual question answering
title_full BPI-MVQA: a bi-branch model for medical visual question answering
title_fullStr BPI-MVQA: a bi-branch model for medical visual question answering
title_full_unstemmed BPI-MVQA: a bi-branch model for medical visual question answering
title_short BPI-MVQA: a bi-branch model for medical visual question answering
title_sort bpi-mvqa: a bi-branch model for medical visual question answering
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052498/
https://www.ncbi.nlm.nih.gov/pubmed/35488285
http://dx.doi.org/10.1186/s12880-022-00800-x
work_keys_str_mv AT liushengyan bpimvqaabibranchmodelformedicalvisualquestionanswering
AT zhangxuejie bpimvqaabibranchmodelformedicalvisualquestionanswering
AT zhouxiaobing bpimvqaabibranchmodelformedicalvisualquestionanswering
AT yangjian bpimvqaabibranchmodelformedicalvisualquestionanswering