Cargando…

CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice

BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-relat...

Descripción completa

Detalles Bibliográficos
Autores principales: Raza, Shaina, Schwartz, Brian, Rosella, Laura C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160513/
https://www.ncbi.nlm.nih.gov/pubmed/35655148
http://dx.doi.org/10.1186/s12859-022-04751-6
_version_ 1784719286296641536
author Raza, Shaina
Schwartz, Brian
Rosella, Laura C.
author_facet Raza, Shaina
Schwartz, Brian
Rosella, Laura C.
author_sort Raza, Shaina
collection PubMed
description BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time. METHODS: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics. RESULTS: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%. CONCLUSION: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04751-6.
format Online
Article
Text
id pubmed-9160513
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91605132022-06-02 CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice Raza, Shaina Schwartz, Brian Rosella, Laura C. BMC Bioinformatics Research BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time. METHODS: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics. RESULTS: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%. CONCLUSION: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04751-6. BioMed Central 2022-06-02 /pmc/articles/PMC9160513/ /pubmed/35655148 http://dx.doi.org/10.1186/s12859-022-04751-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Raza, Shaina
Schwartz, Brian
Rosella, Laura C.
CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
title CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
title_full CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
title_fullStr CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
title_full_unstemmed CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
title_short CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
title_sort coquad: a covid-19 question answering dataset system, facilitating research, benchmarking, and practice
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160513/
https://www.ncbi.nlm.nih.gov/pubmed/35655148
http://dx.doi.org/10.1186/s12859-022-04751-6
work_keys_str_mv AT razashaina coquadacovid19questionansweringdatasetsystemfacilitatingresearchbenchmarkingandpractice
AT schwartzbrian coquadacovid19questionansweringdatasetsystemfacilitatingresearchbenchmarkingandpractice
AT rosellalaurac coquadacovid19questionansweringdatasetsystemfacilitatingresearchbenchmarkingandpractice