Cargando…
A dataset for medical instructional video classification and question answering
This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medica...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031721/ https://www.ncbi.nlm.nih.gov/pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y |
_version_ | 1784910666528718848 |
---|---|
author | Gupta, Deepak Attal, Kush Demner-Fushman, Dina |
author_facet | Gupta, Deepak Attal, Kush Demner-Fushman, Dina |
author_sort | Gupta, Deepak |
collection | PubMed |
description | This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research. |
format | Online Article Text |
id | pubmed-10031721 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-100317212023-03-22 A dataset for medical instructional video classification and question answering Gupta, Deepak Attal, Kush Demner-Fushman, Dina Sci Data Data Descriptor This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research. Nature Publishing Group UK 2023-03-22 /pmc/articles/PMC10031721/ /pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Gupta, Deepak Attal, Kush Demner-Fushman, Dina A dataset for medical instructional video classification and question answering |
title | A dataset for medical instructional video classification and question answering |
title_full | A dataset for medical instructional video classification and question answering |
title_fullStr | A dataset for medical instructional video classification and question answering |
title_full_unstemmed | A dataset for medical instructional video classification and question answering |
title_short | A dataset for medical instructional video classification and question answering |
title_sort | dataset for medical instructional video classification and question answering |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031721/ https://www.ncbi.nlm.nih.gov/pubmed/36949119 http://dx.doi.org/10.1038/s41597-023-02036-y |
work_keys_str_mv | AT guptadeepak adatasetformedicalinstructionalvideoclassificationandquestionanswering AT attalkush adatasetformedicalinstructionalvideoclassificationandquestionanswering AT demnerfushmandina adatasetformedicalinstructionalvideoclassificationandquestionanswering AT guptadeepak datasetformedicalinstructionalvideoclassificationandquestionanswering AT attalkush datasetformedicalinstructionalvideoclassificationandquestionanswering AT demnerfushmandina datasetformedicalinstructionalvideoclassificationandquestionanswering |