Cargando…

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

BACKGROUND: The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. OB...

Descripción completa

Detalles Bibliográficos
Autores principales: Teodoro, Douglas, Ferdowsi, Sohrab, Borissov, Nikolay, Kashani, Elham, Vicente Alvarez, David, Copara, Jenny, Gouareb, Racha, Naderi, Nona, Amini, Poorya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8451964/
https://www.ncbi.nlm.nih.gov/pubmed/34375298
http://dx.doi.org/10.2196/30161
_version_ 1784569965341310976
author Teodoro, Douglas
Ferdowsi, Sohrab
Borissov, Nikolay
Kashani, Elham
Vicente Alvarez, David
Copara, Jenny
Gouareb, Racha
Naderi, Nona
Amini, Poorya
author_facet Teodoro, Douglas
Ferdowsi, Sohrab
Borissov, Nikolay
Kashani, Elham
Vicente Alvarez, David
Copara, Jenny
Gouareb, Racha
Naderi, Nona
Amini, Poorya
author_sort Teodoro, Douglas
collection PubMed
description BACKGROUND: The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. OBJECTIVE: In the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. METHODS: Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. RESULTS: The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20. CONCLUSIONS: These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language.
format Online
Article
Text
id pubmed-8451964
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-84519642021-10-18 Information Retrieval in an Infodemic: The Case of COVID-19 Publications Teodoro, Douglas Ferdowsi, Sohrab Borissov, Nikolay Kashani, Elham Vicente Alvarez, David Copara, Jenny Gouareb, Racha Naderi, Nona Amini, Poorya J Med Internet Res Original Paper BACKGROUND: The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. OBJECTIVE: In the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. METHODS: Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. RESULTS: The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20. CONCLUSIONS: These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language. JMIR Publications 2021-09-17 /pmc/articles/PMC8451964/ /pubmed/34375298 http://dx.doi.org/10.2196/30161 Text en ©Douglas Teodoro, Sohrab Ferdowsi, Nikolay Borissov, Elham Kashani, David Vicente Alvarez, Jenny Copara, Racha Gouareb, Nona Naderi, Poorya Amini. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 17.09.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Teodoro, Douglas
Ferdowsi, Sohrab
Borissov, Nikolay
Kashani, Elham
Vicente Alvarez, David
Copara, Jenny
Gouareb, Racha
Naderi, Nona
Amini, Poorya
Information Retrieval in an Infodemic: The Case of COVID-19 Publications
title Information Retrieval in an Infodemic: The Case of COVID-19 Publications
title_full Information Retrieval in an Infodemic: The Case of COVID-19 Publications
title_fullStr Information Retrieval in an Infodemic: The Case of COVID-19 Publications
title_full_unstemmed Information Retrieval in an Infodemic: The Case of COVID-19 Publications
title_short Information Retrieval in an Infodemic: The Case of COVID-19 Publications
title_sort information retrieval in an infodemic: the case of covid-19 publications
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8451964/
https://www.ncbi.nlm.nih.gov/pubmed/34375298
http://dx.doi.org/10.2196/30161
work_keys_str_mv AT teodorodouglas informationretrievalinaninfodemicthecaseofcovid19publications
AT ferdowsisohrab informationretrievalinaninfodemicthecaseofcovid19publications
AT borissovnikolay informationretrievalinaninfodemicthecaseofcovid19publications
AT kashanielham informationretrievalinaninfodemicthecaseofcovid19publications
AT vicentealvarezdavid informationretrievalinaninfodemicthecaseofcovid19publications
AT coparajenny informationretrievalinaninfodemicthecaseofcovid19publications
AT gouarebracha informationretrievalinaninfodemicthecaseofcovid19publications
AT naderinona informationretrievalinaninfodemicthecaseofcovid19publications
AT aminipoorya informationretrievalinaninfodemicthecaseofcovid19publications