Cargando…

Counterfactual Online Learning to Rank

Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback u...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhuang, Shengyao, Zuccon, Guido
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28

_version_	1783520552839806976
author	Zhuang, Shengyao Zuccon, Guido
author_facet	Zhuang, Shengyao Zuccon, Guido
author_sort	Zhuang, Shengyao
collection	PubMed
description	Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions.
format	Online Article Text
id	pubmed-7148247
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-71482472020-04-13 Counterfactual Online Learning to Rank Zhuang, Shengyao Zuccon, Guido Advances in Information Retrieval Article Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions. 2020-03-17 /pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Zhuang, Shengyao Zuccon, Guido Counterfactual Online Learning to Rank
title	Counterfactual Online Learning to Rank
title_full	Counterfactual Online Learning to Rank
title_fullStr	Counterfactual Online Learning to Rank
title_full_unstemmed	Counterfactual Online Learning to Rank
title_short	Counterfactual Online Learning to Rank
title_sort	counterfactual online learning to rank
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/ http://dx.doi.org/10.1007/978-3-030-45439-5_28
work_keys_str_mv	AT zhuangshengyao counterfactualonlinelearningtorank AT zucconguido counterfactualonlinelearningtorank

Counterfactual Online Learning to Rank

Ejemplares similares