Cargando…

Multistage BiCross encoder for multilingual access to COVID-19 health information

The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this pape...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Iknoor, Scarton, Carolina, Bontcheva, Kalina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423231/
https://www.ncbi.nlm.nih.gov/pubmed/34492073
http://dx.doi.org/10.1371/journal.pone.0256874
_version_ 1783749422943830016
author Singh, Iknoor
Scarton, Carolina
Bontcheva, Kalina
author_facet Singh, Iknoor
Scarton, Carolina
Bontcheva, Kalina
author_sort Singh, Iknoor
collection PubMed
description The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.
format Online
Article
Text
id pubmed-8423231
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-84232312021-09-08 Multistage BiCross encoder for multilingual access to COVID-19 health information Singh, Iknoor Scarton, Carolina Bontcheva, Kalina PLoS One Research Article The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs. Public Library of Science 2021-09-07 /pmc/articles/PMC8423231/ /pubmed/34492073 http://dx.doi.org/10.1371/journal.pone.0256874 Text en © 2021 Singh et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Singh, Iknoor
Scarton, Carolina
Bontcheva, Kalina
Multistage BiCross encoder for multilingual access to COVID-19 health information
title Multistage BiCross encoder for multilingual access to COVID-19 health information
title_full Multistage BiCross encoder for multilingual access to COVID-19 health information
title_fullStr Multistage BiCross encoder for multilingual access to COVID-19 health information
title_full_unstemmed Multistage BiCross encoder for multilingual access to COVID-19 health information
title_short Multistage BiCross encoder for multilingual access to COVID-19 health information
title_sort multistage bicross encoder for multilingual access to covid-19 health information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423231/
https://www.ncbi.nlm.nih.gov/pubmed/34492073
http://dx.doi.org/10.1371/journal.pone.0256874
work_keys_str_mv AT singhiknoor multistagebicrossencoderformultilingualaccesstocovid19healthinformation
AT scartoncarolina multistagebicrossencoderformultilingualaccesstocovid19healthinformation
AT bontchevakalina multistagebicrossencoderformultilingualaccesstocovid19healthinformation