Cargando…

A comparative analysis of system features used in the TREC-COVID information retrieval challenge

The COVID-19 pandemic has resulted in a rapidly growing quantity of scientific publications from journal articles, preprints, and other sources. The TREC-COVID Challenge was created to evaluate information retrieval (IR) methods and systems for this quickly expanding corpus. Using the COVID-19 Open...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Jimmy S., Hersh, William R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier Inc. 2021
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021447/ https://www.ncbi.nlm.nih.gov/pubmed/33831536 http://dx.doi.org/10.1016/j.jbi.2021.103745

_version_	1783674748646981632
author	Chen, Jimmy S. Hersh, William R.
author_facet	Chen, Jimmy S. Hersh, William R.
author_sort	Chen, Jimmy S.
collection	PubMed
description	The COVID-19 pandemic has resulted in a rapidly growing quantity of scientific publications from journal articles, preprints, and other sources. The TREC-COVID Challenge was created to evaluate information retrieval (IR) methods and systems for this quickly expanding corpus. Using the COVID-19 Open Research Dataset (CORD-19), several dozen research teams participated in over 5 rounds of the TREC-COVID Challenge. While previous work has compared IR techniques used on other test collections, there are no studies that have analyzed the methods used by participants in the TREC-COVID Challenge. We manually reviewed team run reports from Rounds 2 and 5, extracted features from the documented methodologies, and used a univariate and multivariate regression-based analysis to identify features associated with higher retrieval performance. We observed that fine-tuning datasets with relevance judgments, MS-MARCO, and CORD-19 document vectors was associated with improved performance in Round 2 but not in Round 5. Though the relatively decreased heterogeneity of runs in Round 5 may explain the lack of significance in that round, fine-tuning has been found to improve search performance in previous challenge evaluations by improving a system’s ability to map relevant queries and phrases to documents. Furthermore, term expansion was associated with improvement in system performance, and the use of the narrative field in the TREC-COVID topics was associated with decreased system performance in both rounds. These findings emphasize the need for clear queries in search. While our study has some limitations in its generalizability and scope of techniques analyzed, we identified some IR techniques that may be useful in building search systems for COVID-19 using the TREC-COVID test collections.
format	Online Article Text
id	pubmed-8021447
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Elsevier Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-80214472021-04-06 A comparative analysis of system features used in the TREC-COVID information retrieval challenge Chen, Jimmy S. Hersh, William R. J Biomed Inform Original Research The COVID-19 pandemic has resulted in a rapidly growing quantity of scientific publications from journal articles, preprints, and other sources. The TREC-COVID Challenge was created to evaluate information retrieval (IR) methods and systems for this quickly expanding corpus. Using the COVID-19 Open Research Dataset (CORD-19), several dozen research teams participated in over 5 rounds of the TREC-COVID Challenge. While previous work has compared IR techniques used on other test collections, there are no studies that have analyzed the methods used by participants in the TREC-COVID Challenge. We manually reviewed team run reports from Rounds 2 and 5, extracted features from the documented methodologies, and used a univariate and multivariate regression-based analysis to identify features associated with higher retrieval performance. We observed that fine-tuning datasets with relevance judgments, MS-MARCO, and CORD-19 document vectors was associated with improved performance in Round 2 but not in Round 5. Though the relatively decreased heterogeneity of runs in Round 5 may explain the lack of significance in that round, fine-tuning has been found to improve search performance in previous challenge evaluations by improving a system’s ability to map relevant queries and phrases to documents. Furthermore, term expansion was associated with improvement in system performance, and the use of the narrative field in the TREC-COVID topics was associated with decreased system performance in both rounds. These findings emphasize the need for clear queries in search. While our study has some limitations in its generalizability and scope of techniques analyzed, we identified some IR techniques that may be useful in building search systems for COVID-19 using the TREC-COVID test collections. Elsevier Inc. 2021-05 2021-04-06 /pmc/articles/PMC8021447/ /pubmed/33831536 http://dx.doi.org/10.1016/j.jbi.2021.103745 Text en © 2021 Elsevier Inc. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle	Original Research Chen, Jimmy S. Hersh, William R. A comparative analysis of system features used in the TREC-COVID information retrieval challenge
title	A comparative analysis of system features used in the TREC-COVID information retrieval challenge
title_full	A comparative analysis of system features used in the TREC-COVID information retrieval challenge
title_fullStr	A comparative analysis of system features used in the TREC-COVID information retrieval challenge
title_full_unstemmed	A comparative analysis of system features used in the TREC-COVID information retrieval challenge
title_short	A comparative analysis of system features used in the TREC-COVID information retrieval challenge
title_sort	comparative analysis of system features used in the trec-covid information retrieval challenge
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021447/ https://www.ncbi.nlm.nih.gov/pubmed/33831536 http://dx.doi.org/10.1016/j.jbi.2021.103745
work_keys_str_mv	AT chenjimmys acomparativeanalysisofsystemfeaturesusedinthetreccovidinformationretrievalchallenge AT hershwilliamr acomparativeanalysisofsystemfeaturesusedinthetreccovidinformationretrievalchallenge AT chenjimmys comparativeanalysisofsystemfeaturesusedinthetreccovidinformationretrievalchallenge AT hershwilliamr comparativeanalysisofsystemfeaturesusedinthetreccovidinformationretrievalchallenge

A comparative analysis of system features used in the TREC-COVID information retrieval challenge

Ejemplares similares