Cargando…
CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-relat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160513/ https://www.ncbi.nlm.nih.gov/pubmed/35655148 http://dx.doi.org/10.1186/s12859-022-04751-6 |
_version_ | 1784719286296641536 |
---|---|
author | Raza, Shaina Schwartz, Brian Rosella, Laura C. |
author_facet | Raza, Shaina Schwartz, Brian Rosella, Laura C. |
author_sort | Raza, Shaina |
collection | PubMed |
description | BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time. METHODS: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics. RESULTS: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%. CONCLUSION: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04751-6. |
format | Online Article Text |
id | pubmed-9160513 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91605132022-06-02 CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice Raza, Shaina Schwartz, Brian Rosella, Laura C. BMC Bioinformatics Research BACKGROUND: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time. METHODS: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics. RESULTS: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%. CONCLUSION: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04751-6. BioMed Central 2022-06-02 /pmc/articles/PMC9160513/ /pubmed/35655148 http://dx.doi.org/10.1186/s12859-022-04751-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Raza, Shaina Schwartz, Brian Rosella, Laura C. CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice |
title | CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice |
title_full | CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice |
title_fullStr | CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice |
title_full_unstemmed | CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice |
title_short | CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice |
title_sort | coquad: a covid-19 question answering dataset system, facilitating research, benchmarking, and practice |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160513/ https://www.ncbi.nlm.nih.gov/pubmed/35655148 http://dx.doi.org/10.1186/s12859-022-04751-6 |
work_keys_str_mv | AT razashaina coquadacovid19questionansweringdatasetsystemfacilitatingresearchbenchmarkingandpractice AT schwartzbrian coquadacovid19questionansweringdatasetsystemfacilitatingresearchbenchmarkingandpractice AT rosellalaurac coquadacovid19questionansweringdatasetsystemfacilitatingresearchbenchmarkingandpractice |