Cargando…

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforce...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdelfattah, Sherif, Kasmarik, Kathryn, Hu, Jiankun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189603/
https://www.ncbi.nlm.nih.gov/pubmed/30356836
http://dx.doi.org/10.3389/fnbot.2018.00065
_version_ 1783363393330085888
author Abdelfattah, Sherif
Kasmarik, Kathryn
Hu, Jiankun
author_facet Abdelfattah, Sherif
Kasmarik, Kathryn
Hu, Jiankun
author_sort Abdelfattah, Sherif
collection PubMed
description Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments.
format Online
Article
Text
id pubmed-6189603
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-61896032018-10-23 Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun Front Neurorobot Neuroscience Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments. Frontiers Media S.A. 2018-10-09 /pmc/articles/PMC6189603/ /pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065 Text en Copyright © 2018 Abdelfattah, Kasmarik and Hu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Abdelfattah, Sherif
Kasmarik, Kathryn
Hu, Jiankun
Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_full Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_fullStr Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_full_unstemmed Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_short Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_sort evolving robust policy coverage sets in multi-objective markov decision processes through intrinsically motivated self-play
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189603/
https://www.ncbi.nlm.nih.gov/pubmed/30356836
http://dx.doi.org/10.3389/fnbot.2018.00065
work_keys_str_mv AT abdelfattahsherif evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay
AT kasmarikkathryn evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay
AT hujiankun evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay