Cargando…
Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Consideri...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030033/ https://www.ncbi.nlm.nih.gov/pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207 |
_version_ | 1784910271465127936 |
---|---|
author | Zhang, Jin Liu, Qing Han, XiaoHang |
author_facet | Zhang, Jin Liu, Qing Han, XiaoHang |
author_sort | Zhang, Jin |
collection | PubMed |
description | In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale. |
format | Online Article Text |
id | pubmed-10030033 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-100300332023-03-22 Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem Zhang, Jin Liu, Qing Han, XiaoHang PLoS One Research Article In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale. Public Library of Science 2023-03-21 /pmc/articles/PMC10030033/ /pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207 Text en © 2023 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zhang, Jin Liu, Qing Han, XiaoHang Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem |
title | Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem |
title_full | Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem |
title_fullStr | Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem |
title_full_unstemmed | Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem |
title_short | Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem |
title_sort | dynamic sub-route-based self-adaptive beam search q-learning algorithm for traveling salesman problem |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030033/ https://www.ncbi.nlm.nih.gov/pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207 |
work_keys_str_mv | AT zhangjin dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem AT liuqing dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem AT hanxiaohang dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem |