Cargando…

Learning to rank query expansion terms for COVID-19 scholarly search

OBJECTIVE: With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we pro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Khader, Ayesha, Ensan, Faezeh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier Inc. 2023
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174726/ https://www.ncbi.nlm.nih.gov/pubmed/37178780 http://dx.doi.org/10.1016/j.jbi.2023.104386

_version_	1785040096516374528
author	Khader, Ayesha Ensan, Faezeh
author_facet	Khader, Ayesha Ensan, Faezeh
author_sort	Khader, Ayesha
collection	PubMed
description	OBJECTIVE: With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we propose a Contextual Query Expansion framework based on the clinical Domain knowledge (CQED) for formalizing an effective search over PubMed to retrieve relevant COVID-19 scholarly articles to a given information need. MATERIALS AND METHODS: For the sake of training and evaluation, we use the widely adopted TREC-COVID benchmark. Given a query, the proposed framework utilizes a contextual and a domain-specific neural language model to generate a set of candidate query expansion terms that enrich the original query. Moreover, the framework includes a multi-head attention mechanism that is trained alongside a learning-to-rank model for re-ranking the list of generated expansion candidate terms. The original query and the top-ranked expansion terms are posed to the PubMed search engine for retrieving relevant scholarly articles to an information need. The framework, CQED, can have four different variations, depending upon the learning path adopted for training and re-ranking the candidate expansion terms. RESULTS: The model drastically improves the search performance, when compared to the original query. The performance improvement in comparison to the original query, in terms of [Formula: see text] is 190.85% and in terms of [Formula: see text] is 343.55%. Additionally, the model outperforms all existing state-of-the-art baselines. In terms of P@10, the model that has been optimized based on Precision outperforms all baselines (0.7987). On the other hand, in terms of NDCG@10 (0.7986), MAP (0.3450) and bpref (0.4900), the CQED model that has been optimized based on an average of all retrieval measures outperforms all the baselines. CONCLUSION: The proposed model successfully expands queries posed to PubMed, and improves search performance, as compared to all existing baselines. A success/failure analysis shows that the model improved the search performance of each of the evaluated queries. Moreover, an ablation study depicted that if ranking of generated candidate terms is not conducted, the overall performance decreases. For future work, we would like to explore the application of the presented query expansion framework in conducting technology-assisted Systematic Literature Reviews (SLR).
format	Online Article Text
id	pubmed-10174726
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-101747262023-05-12 Learning to rank query expansion terms for COVID-19 scholarly search Khader, Ayesha Ensan, Faezeh J Biomed Inform Original Research OBJECTIVE: With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we propose a Contextual Query Expansion framework based on the clinical Domain knowledge (CQED) for formalizing an effective search over PubMed to retrieve relevant COVID-19 scholarly articles to a given information need. MATERIALS AND METHODS: For the sake of training and evaluation, we use the widely adopted TREC-COVID benchmark. Given a query, the proposed framework utilizes a contextual and a domain-specific neural language model to generate a set of candidate query expansion terms that enrich the original query. Moreover, the framework includes a multi-head attention mechanism that is trained alongside a learning-to-rank model for re-ranking the list of generated expansion candidate terms. The original query and the top-ranked expansion terms are posed to the PubMed search engine for retrieving relevant scholarly articles to an information need. The framework, CQED, can have four different variations, depending upon the learning path adopted for training and re-ranking the candidate expansion terms. RESULTS: The model drastically improves the search performance, when compared to the original query. The performance improvement in comparison to the original query, in terms of [Formula: see text] is 190.85% and in terms of [Formula: see text] is 343.55%. Additionally, the model outperforms all existing state-of-the-art baselines. In terms of P@10, the model that has been optimized based on Precision outperforms all baselines (0.7987). On the other hand, in terms of NDCG@10 (0.7986), MAP (0.3450) and bpref (0.4900), the CQED model that has been optimized based on an average of all retrieval measures outperforms all the baselines. CONCLUSION: The proposed model successfully expands queries posed to PubMed, and improves search performance, as compared to all existing baselines. A success/failure analysis shows that the model improved the search performance of each of the evaluated queries. Moreover, an ablation study depicted that if ranking of generated candidate terms is not conducted, the overall performance decreases. For future work, we would like to explore the application of the presented query expansion framework in conducting technology-assisted Systematic Literature Reviews (SLR). Elsevier Inc. 2023-06 2023-05-12 /pmc/articles/PMC10174726/ /pubmed/37178780 http://dx.doi.org/10.1016/j.jbi.2023.104386 Text en © 2023 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle	Original Research Khader, Ayesha Ensan, Faezeh Learning to rank query expansion terms for COVID-19 scholarly search
title	Learning to rank query expansion terms for COVID-19 scholarly search
title_full	Learning to rank query expansion terms for COVID-19 scholarly search
title_fullStr	Learning to rank query expansion terms for COVID-19 scholarly search
title_full_unstemmed	Learning to rank query expansion terms for COVID-19 scholarly search
title_short	Learning to rank query expansion terms for COVID-19 scholarly search
title_sort	learning to rank query expansion terms for covid-19 scholarly search
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174726/ https://www.ncbi.nlm.nih.gov/pubmed/37178780 http://dx.doi.org/10.1016/j.jbi.2023.104386
work_keys_str_mv	AT khaderayesha learningtorankqueryexpansiontermsforcovid19scholarlysearch AT ensanfaezeh learningtorankqueryexpansiontermsforcovid19scholarlysearch

Learning to rank query expansion terms for COVID-19 scholarly search

Ejemplares similares