Cargando…
BPI-MVQA: a bi-branch model for medical visual question answering
BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features o...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052498/ https://www.ncbi.nlm.nih.gov/pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x |
_version_ | 1784696796132409344 |
---|---|
author | Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian |
author_facet | Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian |
author_sort | Liu, Shengyan |
collection | PubMed |
description | BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text] , 1.4[Formula: see text] , and 1.1[Formula: see text] . CONCLUSION: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system. |
format | Online Article Text |
id | pubmed-9052498 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-90524982022-04-30 BPI-MVQA: a bi-branch model for medical visual question answering Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian BMC Med Imaging Research BACKGROUND: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. METHOD: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels. RESULT: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text] , 1.4[Formula: see text] , and 1.1[Formula: see text] . CONCLUSION: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system. BioMed Central 2022-04-29 /pmc/articles/PMC9052498/ /pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liu, Shengyan Zhang, Xuejie Zhou, Xiaobing Yang, Jian BPI-MVQA: a bi-branch model for medical visual question answering |
title | BPI-MVQA: a bi-branch model for medical visual question answering |
title_full | BPI-MVQA: a bi-branch model for medical visual question answering |
title_fullStr | BPI-MVQA: a bi-branch model for medical visual question answering |
title_full_unstemmed | BPI-MVQA: a bi-branch model for medical visual question answering |
title_short | BPI-MVQA: a bi-branch model for medical visual question answering |
title_sort | bpi-mvqa: a bi-branch model for medical visual question answering |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052498/ https://www.ncbi.nlm.nih.gov/pubmed/35488285 http://dx.doi.org/10.1186/s12880-022-00800-x |
work_keys_str_mv | AT liushengyan bpimvqaabibranchmodelformedicalvisualquestionanswering AT zhangxuejie bpimvqaabibranchmodelformedicalvisualquestionanswering AT zhouxiaobing bpimvqaabibranchmodelformedicalvisualquestionanswering AT yangjian bpimvqaabibranchmodelformedicalvisualquestionanswering |