Cargando…

Unsupervised query reduction for efficient yet effective news background linking

In this article, we study efficient techniques to tackle the news background linking problem, in which an online reader seeks background knowledge about a given article to better understand its context. Recently, this problem attracted many researchers, especially in the Text Retrieval Conference (T...

Descripción completa

Detalles Bibliográficos
Autores principales: Essam, Marwa, Elsayed, Tamer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280215/
https://www.ncbi.nlm.nih.gov/pubmed/37346502
http://dx.doi.org/10.7717/peerj-cs.1191
_version_ 1785060750053605376
author Essam, Marwa
Elsayed, Tamer
author_facet Essam, Marwa
Elsayed, Tamer
author_sort Essam, Marwa
collection PubMed
description In this article, we study efficient techniques to tackle the news background linking problem, in which an online reader seeks background knowledge about a given article to better understand its context. Recently, this problem attracted many researchers, especially in the Text Retrieval Conference (TREC) community. Surprisingly, the most effective method to date uses the entire input news article as a search query in an ad-hoc retrieval approach to retrieve the background links. In a scenario where the lookup for background links is performed online, this method becomes inefficient, especially if the search scope is big such as the Web, due to the relatively long generated query, which results in a long response time. In this work, we evaluate different unsupervised approaches for reducing the input news article to a much shorter, hence efficient, search query, while maintaining the retrieval effectiveness. We conducted several experiments using the Washington Post dataset, released specifically for the news background linking problem. Our results show that a simple statistical analysis of the article using a recent keyword extraction technique reaches an average of 6.2× speedup in query response time over the full article approach, with no significant difference in effectiveness. Moreover, we found that further reduction of the search terms can be achieved by eliminating relatively low TF-IDF values from the search queries, yielding even more efficient retrieval of 13.3× speedup, while still maintaining the retrieval effectiveness. This makes our approach more suitable for practical online scenarios. Our study is the first to address the efficiency of news background linking systems. We, therefore, release our source code to promote research in that direction.
format Online
Article
Text
id pubmed-10280215
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-102802152023-06-21 Unsupervised query reduction for efficient yet effective news background linking Essam, Marwa Elsayed, Tamer PeerJ Comput Sci Algorithms and Analysis of Algorithms In this article, we study efficient techniques to tackle the news background linking problem, in which an online reader seeks background knowledge about a given article to better understand its context. Recently, this problem attracted many researchers, especially in the Text Retrieval Conference (TREC) community. Surprisingly, the most effective method to date uses the entire input news article as a search query in an ad-hoc retrieval approach to retrieve the background links. In a scenario where the lookup for background links is performed online, this method becomes inefficient, especially if the search scope is big such as the Web, due to the relatively long generated query, which results in a long response time. In this work, we evaluate different unsupervised approaches for reducing the input news article to a much shorter, hence efficient, search query, while maintaining the retrieval effectiveness. We conducted several experiments using the Washington Post dataset, released specifically for the news background linking problem. Our results show that a simple statistical analysis of the article using a recent keyword extraction technique reaches an average of 6.2× speedup in query response time over the full article approach, with no significant difference in effectiveness. Moreover, we found that further reduction of the search terms can be achieved by eliminating relatively low TF-IDF values from the search queries, yielding even more efficient retrieval of 13.3× speedup, while still maintaining the retrieval effectiveness. This makes our approach more suitable for practical online scenarios. Our study is the first to address the efficiency of news background linking systems. We, therefore, release our source code to promote research in that direction. PeerJ Inc. 2023-01-13 /pmc/articles/PMC10280215/ /pubmed/37346502 http://dx.doi.org/10.7717/peerj-cs.1191 Text en © 2023 Essam and Elsayed https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Algorithms and Analysis of Algorithms
Essam, Marwa
Elsayed, Tamer
Unsupervised query reduction for efficient yet effective news background linking
title Unsupervised query reduction for efficient yet effective news background linking
title_full Unsupervised query reduction for efficient yet effective news background linking
title_fullStr Unsupervised query reduction for efficient yet effective news background linking
title_full_unstemmed Unsupervised query reduction for efficient yet effective news background linking
title_short Unsupervised query reduction for efficient yet effective news background linking
title_sort unsupervised query reduction for efficient yet effective news background linking
topic Algorithms and Analysis of Algorithms
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280215/
https://www.ncbi.nlm.nih.gov/pubmed/37346502
http://dx.doi.org/10.7717/peerj-cs.1191
work_keys_str_mv AT essammarwa unsupervisedqueryreductionforefficientyeteffectivenewsbackgroundlinking
AT elsayedtamer unsupervisedqueryreductionforefficientyeteffectivenewsbackgroundlinking