Cargando…

A sampling approach to Debiasing the offline evaluation of recommender systems

Offline evaluation of recommender systems (RSs) mostly relies on historical data, which is often biased. The bias is a result of many confounders that affect the data collection process. In such biased data, user-item interactions are Missing Not At Random (MNAR). Measures of recommender system perf...

Descripción completa

Detalles Bibliográficos
Autores principales:	Carraro, Diego, Bridge, Derek
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9001624/ https://www.ncbi.nlm.nih.gov/pubmed/35493700 http://dx.doi.org/10.1007/s10844-021-00651-y

_version_	1784685714811650048
author	Carraro, Diego Bridge, Derek
author_facet	Carraro, Diego Bridge, Derek
author_sort	Carraro, Diego
collection	PubMed
description	Offline evaluation of recommender systems (RSs) mostly relies on historical data, which is often biased. The bias is a result of many confounders that affect the data collection process. In such biased data, user-item interactions are Missing Not At Random (MNAR). Measures of recommender system performance on MNAR test data are unlikely to be reliable indicators of real-world performance unless something is done to mitigate the bias. One widespread way that researchers try to obtain less biased offline evaluation is by designing new, supposedly unbiased performance metrics for use on MNAR test data. We investigate an alternative solution, a sampling approach. The general idea is to use a sampling strategy on MNAR data to generate an intervened test set with less bias — one in which interactions are Missing At Random (MAR) or, at least, one that is more MAR-like. An existing example of this approach is SKEW, a sampling strategy that aims to adjust for the confounding effect that an item’s popularity has on its likelihood of being observed. In this paper, after extensively surveying the literature on the bias problem in the offline evaluation of RSs, we propose and formulate a novel sampling approach, which we call WTD; we also propose a more practical variant, which we call WTD_H. We compare our methods to SKEW and to two baselines which perform a random intervention on MNAR data. We empirically validate for the first time the effectiveness of SKEW and we show our approach to be a better estimator of the performance that one would obtain on (unbiased) MAR test data. Our strategy benefits from high generality (e.g. it can also be employed for training a recommender) and low overheads (e.g. it does not require any learning).
format	Online Article Text
id	pubmed-9001624
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-90016242022-04-27 A sampling approach to Debiasing the offline evaluation of recommender systems Carraro, Diego Bridge, Derek J Intell Inf Syst Article Offline evaluation of recommender systems (RSs) mostly relies on historical data, which is often biased. The bias is a result of many confounders that affect the data collection process. In such biased data, user-item interactions are Missing Not At Random (MNAR). Measures of recommender system performance on MNAR test data are unlikely to be reliable indicators of real-world performance unless something is done to mitigate the bias. One widespread way that researchers try to obtain less biased offline evaluation is by designing new, supposedly unbiased performance metrics for use on MNAR test data. We investigate an alternative solution, a sampling approach. The general idea is to use a sampling strategy on MNAR data to generate an intervened test set with less bias — one in which interactions are Missing At Random (MAR) or, at least, one that is more MAR-like. An existing example of this approach is SKEW, a sampling strategy that aims to adjust for the confounding effect that an item’s popularity has on its likelihood of being observed. In this paper, after extensively surveying the literature on the bias problem in the offline evaluation of RSs, we propose and formulate a novel sampling approach, which we call WTD; we also propose a more practical variant, which we call WTD_H. We compare our methods to SKEW and to two baselines which perform a random intervention on MNAR data. We empirically validate for the first time the effectiveness of SKEW and we show our approach to be a better estimator of the performance that one would obtain on (unbiased) MAR test data. Our strategy benefits from high generality (e.g. it can also be employed for training a recommender) and low overheads (e.g. it does not require any learning). Springer US 2021-07-10 2022 /pmc/articles/PMC9001624/ /pubmed/35493700 http://dx.doi.org/10.1007/s10844-021-00651-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Carraro, Diego Bridge, Derek A sampling approach to Debiasing the offline evaluation of recommender systems
title	A sampling approach to Debiasing the offline evaluation of recommender systems
title_full	A sampling approach to Debiasing the offline evaluation of recommender systems
title_fullStr	A sampling approach to Debiasing the offline evaluation of recommender systems
title_full_unstemmed	A sampling approach to Debiasing the offline evaluation of recommender systems
title_short	A sampling approach to Debiasing the offline evaluation of recommender systems
title_sort	sampling approach to debiasing the offline evaluation of recommender systems
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9001624/ https://www.ncbi.nlm.nih.gov/pubmed/35493700 http://dx.doi.org/10.1007/s10844-021-00651-y
work_keys_str_mv	AT carrarodiego asamplingapproachtodebiasingtheofflineevaluationofrecommendersystems AT bridgederek asamplingapproachtodebiasingtheofflineevaluationofrecommendersystems AT carrarodiego samplingapproachtodebiasingtheofflineevaluationofrecommendersystems AT bridgederek samplingapproachtodebiasingtheofflineevaluationofrecommendersystems

A sampling approach to Debiasing the offline evaluation of recommender systems

Ejemplares similares