Cargando…

A dataset for medical instructional video classification and question answering

This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medica...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Deepak, Attal, Kush, Demner-Fushman, Dina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031721/
https://www.ncbi.nlm.nih.gov/pubmed/36949119
http://dx.doi.org/10.1038/s41597-023-02036-y
_version_ 1784910666528718848
author Gupta, Deepak
Attal, Kush
Demner-Fushman, Dina
author_facet Gupta, Deepak
Attal, Kush
Demner-Fushman, Dina
author_sort Gupta, Deepak
collection PubMed
description This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research.
format Online
Article
Text
id pubmed-10031721
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-100317212023-03-22 A dataset for medical instructional video classification and question answering Gupta, Deepak Attal, Kush Demner-Fushman, Dina Sci Data Data Descriptor This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research. Nature Publishing Group UK 2023-03-22 /pmc/articles/PMC10031721/ /pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Gupta, Deepak
Attal, Kush
Demner-Fushman, Dina
A dataset for medical instructional video classification and question answering
title A dataset for medical instructional video classification and question answering
title_full A dataset for medical instructional video classification and question answering
title_fullStr A dataset for medical instructional video classification and question answering
title_full_unstemmed A dataset for medical instructional video classification and question answering
title_short A dataset for medical instructional video classification and question answering
title_sort dataset for medical instructional video classification and question answering
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031721/
https://www.ncbi.nlm.nih.gov/pubmed/36949119
http://dx.doi.org/10.1038/s41597-023-02036-y
work_keys_str_mv AT guptadeepak adatasetformedicalinstructionalvideoclassificationandquestionanswering
AT attalkush adatasetformedicalinstructionalvideoclassificationandquestionanswering
AT demnerfushmandina adatasetformedicalinstructionalvideoclassificationandquestionanswering
AT guptadeepak datasetformedicalinstructionalvideoclassificationandquestionanswering
AT attalkush datasetformedicalinstructionalvideoclassificationandquestionanswering
AT demnerfushmandina datasetformedicalinstructionalvideoclassificationandquestionanswering