Cargando…

How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers

Learning from only real-world collected data can be unrealistic and time consuming in many scenario. One alternative is to use synthetic data as learning environments to learn rare situations and replay buffers to speed up the learning. In this work, we examine the hypothesis of how the creation of...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahmoud, Sara, Billing, Erik, Svensson, Henrik, Thill, Serge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9905678/
https://www.ncbi.nlm.nih.gov/pubmed/36762255
http://dx.doi.org/10.3389/frai.2023.1098982
_version_ 1784883849852878848
author Mahmoud, Sara
Billing, Erik
Svensson, Henrik
Thill, Serge
author_facet Mahmoud, Sara
Billing, Erik
Svensson, Henrik
Thill, Serge
author_sort Mahmoud, Sara
collection PubMed
description Learning from only real-world collected data can be unrealistic and time consuming in many scenario. One alternative is to use synthetic data as learning environments to learn rare situations and replay buffers to speed up the learning. In this work, we examine the hypothesis of how the creation of the environment affects the training of reinforcement learning agent through auto-generated environment mechanisms. We take the autonomous vehicle as an application. We compare the effect of two approaches to generate training data for artificial cognitive agents. We consider the added value of curriculum learning—just as in human learning—as a way to structure novel training data that the agent has not seen before as well as that of using a replay buffer to train further on data the agent has seen before. In other words, the focus of this paper is on characteristics of the training data rather than on learning algorithms. We therefore use two tasks that are commonly trained early on in autonomous vehicle research: lane keeping and pedestrian avoidance. Our main results show that curriculum learning indeed offers an additional benefit over a vanilla reinforcement learning approach (using Deep-Q Learning), but the replay buffer actually has a detrimental effect in most (but not all) combinations of data generation approaches we considered here. The benefit of curriculum learning does depend on the existence of a well-defined difficulty metric with which various training scenarios can be ordered. In the lane-keeping task, we can define it as a function of the curvature of the road, in which the steeper and more occurring curves on the road, the more difficult it gets. Defining such a difficulty metric in other scenarios is not always trivial. In general, the results of this paper emphasize both the importance of considering data characterization, such as curriculum learning, and the importance of defining an appropriate metric for the task.
format Online
Article
Text
id pubmed-9905678
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-99056782023-02-08 How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers Mahmoud, Sara Billing, Erik Svensson, Henrik Thill, Serge Front Artif Intell Artificial Intelligence Learning from only real-world collected data can be unrealistic and time consuming in many scenario. One alternative is to use synthetic data as learning environments to learn rare situations and replay buffers to speed up the learning. In this work, we examine the hypothesis of how the creation of the environment affects the training of reinforcement learning agent through auto-generated environment mechanisms. We take the autonomous vehicle as an application. We compare the effect of two approaches to generate training data for artificial cognitive agents. We consider the added value of curriculum learning—just as in human learning—as a way to structure novel training data that the agent has not seen before as well as that of using a replay buffer to train further on data the agent has seen before. In other words, the focus of this paper is on characteristics of the training data rather than on learning algorithms. We therefore use two tasks that are commonly trained early on in autonomous vehicle research: lane keeping and pedestrian avoidance. Our main results show that curriculum learning indeed offers an additional benefit over a vanilla reinforcement learning approach (using Deep-Q Learning), but the replay buffer actually has a detrimental effect in most (but not all) combinations of data generation approaches we considered here. The benefit of curriculum learning does depend on the existence of a well-defined difficulty metric with which various training scenarios can be ordered. In the lane-keeping task, we can define it as a function of the curvature of the road, in which the steeper and more occurring curves on the road, the more difficult it gets. Defining such a difficulty metric in other scenarios is not always trivial. In general, the results of this paper emphasize both the importance of considering data characterization, such as curriculum learning, and the importance of defining an appropriate metric for the task. Frontiers Media S.A. 2023-01-25 /pmc/articles/PMC9905678/ /pubmed/36762255 http://dx.doi.org/10.3389/frai.2023.1098982 Text en Copyright © 2023 Mahmoud, Billing, Svensson and Thill. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Mahmoud, Sara
Billing, Erik
Svensson, Henrik
Thill, Serge
How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers
title How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers
title_full How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers
title_fullStr How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers
title_full_unstemmed How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers
title_short How to train a self-driving vehicle: On the added value (or lack thereof) of curriculum learning and replay buffers
title_sort how to train a self-driving vehicle: on the added value (or lack thereof) of curriculum learning and replay buffers
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9905678/
https://www.ncbi.nlm.nih.gov/pubmed/36762255
http://dx.doi.org/10.3389/frai.2023.1098982
work_keys_str_mv AT mahmoudsara howtotrainaselfdrivingvehicleontheaddedvalueorlackthereofofcurriculumlearningandreplaybuffers
AT billingerik howtotrainaselfdrivingvehicleontheaddedvalueorlackthereofofcurriculumlearningandreplaybuffers
AT svenssonhenrik howtotrainaselfdrivingvehicleontheaddedvalueorlackthereofofcurriculumlearningandreplaybuffers
AT thillserge howtotrainaselfdrivingvehicleontheaddedvalueorlackthereofofcurriculumlearningandreplaybuffers