Cargando…
Counterfactual Online Learning to Rank
Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback u...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28 |
_version_ | 1783520552839806976 |
---|---|
author | Zhuang, Shengyao Zuccon, Guido |
author_facet | Zhuang, Shengyao Zuccon, Guido |
author_sort | Zhuang, Shengyao |
collection | PubMed |
description | Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions. |
format | Online Article Text |
id | pubmed-7148247 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71482472020-04-13 Counterfactual Online Learning to Rank Zhuang, Shengyao Zuccon, Guido Advances in Information Retrieval Article Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions. 2020-03-17 /pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Zhuang, Shengyao Zuccon, Guido Counterfactual Online Learning to Rank |
title | Counterfactual Online Learning to Rank |
title_full | Counterfactual Online Learning to Rank |
title_fullStr | Counterfactual Online Learning to Rank |
title_full_unstemmed | Counterfactual Online Learning to Rank |
title_short | Counterfactual Online Learning to Rank |
title_sort | counterfactual online learning to rank |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28 |
work_keys_str_mv | AT zhuangshengyao counterfactualonlinelearningtorank AT zucconguido counterfactualonlinelearningtorank |