Cargando…

MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sharma, Dhruv, Purushotham, Sanjay, Reddy, Chandan K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494920/ https://www.ncbi.nlm.nih.gov/pubmed/34615894 http://dx.doi.org/10.1038/s41598-021-98390-1

_version_	1784579416732467200
author	Sharma, Dhruv Purushotham, Sanjay Reddy, Chandan K.
author_facet	Sharma, Dhruv Purushotham, Sanjay Reddy, Chandan K.
author_sort	Sharma, Dhruv
collection	PubMed
description	Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision maker. Thus, it becomes crucial to have a reliable visual question answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction—categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrate that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results.
format	Online Article Text
id	pubmed-8494920
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-84949202021-10-08 MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain Sharma, Dhruv Purushotham, Sanjay Reddy, Chandan K. Sci Rep Article Medical images are difficult to comprehend for a person without expertise. The scarcity of medical practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision maker. Thus, it becomes crucial to have a reliable visual question answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction—categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrate that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results. Nature Publishing Group UK 2021-10-06 /pmc/articles/PMC8494920/ /pubmed/34615894 http://dx.doi.org/10.1038/s41598-021-98390-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Sharma, Dhruv Purushotham, Sanjay Reddy, Chandan K. MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_full	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_fullStr	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_full_unstemmed	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_short	MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
title_sort	medfusenet: an attention-based multimodal deep learning model for visual question answering in the medical domain
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494920/ https://www.ncbi.nlm.nih.gov/pubmed/34615894 http://dx.doi.org/10.1038/s41598-021-98390-1
work_keys_str_mv	AT sharmadhruv medfusenetanattentionbasedmultimodaldeeplearningmodelforvisualquestionansweringinthemedicaldomain AT purushothamsanjay medfusenetanattentionbasedmultimodaldeeplearningmodelforvisualquestionansweringinthemedicaldomain AT reddychandank medfusenetanattentionbasedmultimodaldeeplearningmodelforvisualquestionansweringinthemedicaldomain

MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

Ejemplares similares