Cargando…

PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function

Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, drivi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Jie, Wu, Tao, Shi, Meiping, Jiang, Wei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582292/ https://www.ncbi.nlm.nih.gov/pubmed/33019643 http://dx.doi.org/10.3390/s20195626

_version_	1783599157351546880
author	Chen, Jie Wu, Tao Shi, Meiping Jiang, Wei
author_facet	Chen, Jie Wu, Tao Shi, Meiping Jiang, Wei
author_sort	Chen, Jie
collection	PubMed
description	Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.
format	Online Article Text
id	pubmed-7582292
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75822922020-10-28 PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function Chen, Jie Wu, Tao Shi, Meiping Jiang, Wei Sensors (Basel) Article Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability. MDPI 2020-10-01 /pmc/articles/PMC7582292/ /pubmed/33019643 http://dx.doi.org/10.3390/s20195626 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chen, Jie Wu, Tao Shi, Meiping Jiang, Wei PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function
title	PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function
title_full	PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function
title_fullStr	PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function
title_full_unstemmed	PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function
title_short	PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function
title_sort	porf-ddpg: learning personalized autonomous driving behavior with progressively optimized reward function
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582292/ https://www.ncbi.nlm.nih.gov/pubmed/33019643 http://dx.doi.org/10.3390/s20195626
work_keys_str_mv	AT chenjie porfddpglearningpersonalizedautonomousdrivingbehaviorwithprogressivelyoptimizedrewardfunction AT wutao porfddpglearningpersonalizedautonomousdrivingbehaviorwithprogressivelyoptimizedrewardfunction AT shimeiping porfddpglearningpersonalizedautonomousdrivingbehaviorwithprogressivelyoptimizedrewardfunction AT jiangwei porfddpglearningpersonalizedautonomousdrivingbehaviorwithprogressivelyoptimizedrewardfunction

PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function

Ejemplares similares