Cargando…

Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Consideri...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jin, Liu, Qing, Han, XiaoHang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030033/
https://www.ncbi.nlm.nih.gov/pubmed/36943840
http://dx.doi.org/10.1371/journal.pone.0283207
_version_ 1784910271465127936
author Zhang, Jin
Liu, Qing
Han, XiaoHang
author_facet Zhang, Jin
Liu, Qing
Han, XiaoHang
author_sort Zhang, Jin
collection PubMed
description In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale.
format Online
Article
Text
id pubmed-10030033
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-100300332023-03-22 Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem Zhang, Jin Liu, Qing Han, XiaoHang PLoS One Research Article In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale. Public Library of Science 2023-03-21 /pmc/articles/PMC10030033/ /pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207 Text en © 2023 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Jin
Liu, Qing
Han, XiaoHang
Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_full Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_fullStr Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_full_unstemmed Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_short Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_sort dynamic sub-route-based self-adaptive beam search q-learning algorithm for traveling salesman problem
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030033/
https://www.ncbi.nlm.nih.gov/pubmed/36943840
http://dx.doi.org/10.1371/journal.pone.0283207
work_keys_str_mv AT zhangjin dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem
AT liuqing dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem
AT hanxiaohang dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem