Cargando…
Multistage BiCross encoder for multilingual access to COVID-19 health information
The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this pape...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423231/ https://www.ncbi.nlm.nih.gov/pubmed/34492073 http://dx.doi.org/10.1371/journal.pone.0256874 |
_version_ | 1783749422943830016 |
---|---|
author | Singh, Iknoor Scarton, Carolina Bontcheva, Kalina |
author_facet | Singh, Iknoor Scarton, Carolina Bontcheva, Kalina |
author_sort | Singh, Iknoor |
collection | PubMed |
description | The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs. |
format | Online Article Text |
id | pubmed-8423231 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-84232312021-09-08 Multistage BiCross encoder for multilingual access to COVID-19 health information Singh, Iknoor Scarton, Carolina Bontcheva, Kalina PLoS One Research Article The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs. Public Library of Science 2021-09-07 /pmc/articles/PMC8423231/ /pubmed/34492073 http://dx.doi.org/10.1371/journal.pone.0256874 Text en © 2021 Singh et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Singh, Iknoor Scarton, Carolina Bontcheva, Kalina Multistage BiCross encoder for multilingual access to COVID-19 health information |
title | Multistage BiCross encoder for multilingual access to COVID-19 health information |
title_full | Multistage BiCross encoder for multilingual access to COVID-19 health information |
title_fullStr | Multistage BiCross encoder for multilingual access to COVID-19 health information |
title_full_unstemmed | Multistage BiCross encoder for multilingual access to COVID-19 health information |
title_short | Multistage BiCross encoder for multilingual access to COVID-19 health information |
title_sort | multistage bicross encoder for multilingual access to covid-19 health information |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423231/ https://www.ncbi.nlm.nih.gov/pubmed/34492073 http://dx.doi.org/10.1371/journal.pone.0256874 |
work_keys_str_mv | AT singhiknoor multistagebicrossencoderformultilingualaccesstocovid19healthinformation AT scartoncarolina multistagebicrossencoderformultilingualaccesstocovid19healthinformation AT bontchevakalina multistagebicrossencoderformultilingualaccesstocovid19healthinformation |