Cargando…

Off-Policy Recommendation System Without Exploration

Recommendation System (RS) can be treated as an intelligent agent which aims to generate policy maximizing customers’ long term satisfaction. Off-policy reinforcement learning methods based on Q-learning and actor-critic methods are commonly used to train RS. Though these methods can leverage previo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Chengwei, Zhou, Tengfei, Chen, Chen, Hu, Tianlei, Chen, Gang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206175/ http://dx.doi.org/10.1007/978-3-030-47426-3_2

_version_	1783530362572374016
author	Wang, Chengwei Zhou, Tengfei Chen, Chen Hu, Tianlei Chen, Gang
author_facet	Wang, Chengwei Zhou, Tengfei Chen, Chen Hu, Tianlei Chen, Gang
author_sort	Wang, Chengwei
collection	PubMed
description	Recommendation System (RS) can be treated as an intelligent agent which aims to generate policy maximizing customers’ long term satisfaction. Off-policy reinforcement learning methods based on Q-learning and actor-critic methods are commonly used to train RS. Though these methods can leverage previously collected dataset for sampling efficient training, they are sensitive to the distribution of off-policy data and make limited progress unless more on-policy data are collected. However, allowing a badly-trained RS to interact with customers can result in unpredictable loss. Therefore, it is highly desirable that the off-policy method can stably train an RS when the off-policy data is fixed and there is no further interaction with the environment. To fulfill these requirements, we devise a novel method name Generator Constrained Q-learning (GCQ). GCQ additionally trains an action generator via supervised learning. The generator is used to mimic data distribution and stabilize the performance of recommendation policy. Empirical studies show that the proposed method outperforms state-of-the-art techniques on both offline and simulated online environments.
format	Online Article Text
id	pubmed-7206175
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72061752020-05-08 Off-Policy Recommendation System Without Exploration Wang, Chengwei Zhou, Tengfei Chen, Chen Hu, Tianlei Chen, Gang Advances in Knowledge Discovery and Data Mining Article Recommendation System (RS) can be treated as an intelligent agent which aims to generate policy maximizing customers’ long term satisfaction. Off-policy reinforcement learning methods based on Q-learning and actor-critic methods are commonly used to train RS. Though these methods can leverage previously collected dataset for sampling efficient training, they are sensitive to the distribution of off-policy data and make limited progress unless more on-policy data are collected. However, allowing a badly-trained RS to interact with customers can result in unpredictable loss. Therefore, it is highly desirable that the off-policy method can stably train an RS when the off-policy data is fixed and there is no further interaction with the environment. To fulfill these requirements, we devise a novel method name Generator Constrained Q-learning (GCQ). GCQ additionally trains an action generator via supervised learning. The generator is used to mimic data distribution and stabilize the performance of recommendation policy. Empirical studies show that the proposed method outperforms state-of-the-art techniques on both offline and simulated online environments. 2020-04-17 /pmc/articles/PMC7206175/ http://dx.doi.org/10.1007/978-3-030-47426-3_2 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Wang, Chengwei Zhou, Tengfei Chen, Chen Hu, Tianlei Chen, Gang Off-Policy Recommendation System Without Exploration
title	Off-Policy Recommendation System Without Exploration
title_full	Off-Policy Recommendation System Without Exploration
title_fullStr	Off-Policy Recommendation System Without Exploration
title_full_unstemmed	Off-Policy Recommendation System Without Exploration
title_short	Off-Policy Recommendation System Without Exploration
title_sort	off-policy recommendation system without exploration
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206175/ http://dx.doi.org/10.1007/978-3-030-47426-3_2
work_keys_str_mv	AT wangchengwei offpolicyrecommendationsystemwithoutexploration AT zhoutengfei offpolicyrecommendationsystemwithoutexploration AT chenchen offpolicyrecommendationsystemwithoutexploration AT hutianlei offpolicyrecommendationsystemwithoutexploration AT chengang offpolicyrecommendationsystemwithoutexploration

Off-Policy Recommendation System Without Exploration

Ejemplares similares