Cargando…

A dataset for medical instructional video classification and question answering

This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medica...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gupta, Deepak, Attal, Kush, Demner-Fushman, Dina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Data Descriptor
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031721/ https://www.ncbi.nlm.nih.gov/pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y

_version_	1784910666528718848
author	Gupta, Deepak Attal, Kush Demner-Fushman, Dina
author_facet	Gupta, Deepak Attal, Kush Demner-Fushman, Dina
author_sort	Gupta, Deepak
collection	PubMed
description	This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research.
format	Online Article Text
id	pubmed-10031721
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-100317212023-03-22 A dataset for medical instructional video classification and question answering Gupta, Deepak Attal, Kush Demner-Fushman, Dina Sci Data Data Descriptor This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research. Nature Publishing Group UK 2023-03-22 /pmc/articles/PMC10031721/ /pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Data Descriptor Gupta, Deepak Attal, Kush Demner-Fushman, Dina A dataset for medical instructional video classification and question answering
title	A dataset for medical instructional video classification and question answering
title_full	A dataset for medical instructional video classification and question answering
title_fullStr	A dataset for medical instructional video classification and question answering
title_full_unstemmed	A dataset for medical instructional video classification and question answering
title_short	A dataset for medical instructional video classification and question answering
title_sort	dataset for medical instructional video classification and question answering
topic	Data Descriptor
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031721/ https://www.ncbi.nlm.nih.gov/pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y
work_keys_str_mv	AT guptadeepak adatasetformedicalinstructionalvideoclassificationandquestionanswering AT attalkush adatasetformedicalinstructionalvideoclassificationandquestionanswering AT demnerfushmandina adatasetformedicalinstructionalvideoclassificationandquestionanswering AT guptadeepak datasetformedicalinstructionalvideoclassificationandquestionanswering AT attalkush datasetformedicalinstructionalvideoclassificationandquestionanswering AT demnerfushmandina datasetformedicalinstructionalvideoclassificationandquestionanswering

A dataset for medical instructional video classification and question answering

Ejemplares similares