Cargando…

Counterfactual Online Learning to Rank

Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback u...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhuang, Shengyao, Zuccon, Guido
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/
http://dx.doi.org/10.1007/978-3-030-45439-5_28
_version_ 1783520552839806976
author Zhuang, Shengyao
Zuccon, Guido
author_facet Zhuang, Shengyao
Zuccon, Guido
author_sort Zhuang, Shengyao
collection PubMed
description Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions.
format Online
Article
Text
id pubmed-7148247
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71482472020-04-13 Counterfactual Online Learning to Rank Zhuang, Shengyao Zuccon, Guido Advances in Information Retrieval Article Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions. 2020-03-17 /pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Zhuang, Shengyao
Zuccon, Guido
Counterfactual Online Learning to Rank
title Counterfactual Online Learning to Rank
title_full Counterfactual Online Learning to Rank
title_fullStr Counterfactual Online Learning to Rank
title_full_unstemmed Counterfactual Online Learning to Rank
title_short Counterfactual Online Learning to Rank
title_sort counterfactual online learning to rank
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/
http://dx.doi.org/10.1007/978-3-030-45439-5_28
work_keys_str_mv AT zhuangshengyao counterfactualonlinelearningtorank
AT zucconguido counterfactualonlinelearningtorank