Cargando…

Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Consideri...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Jin, Liu, Qing, Han, XiaoHang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030033/ https://www.ncbi.nlm.nih.gov/pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207

_version_	1784910271465127936
author	Zhang, Jin Liu, Qing Han, XiaoHang
author_facet	Zhang, Jin Liu, Qing Han, XiaoHang
author_sort	Zhang, Jin
collection	PubMed
description	In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale.
format	Online Article Text
id	pubmed-10030033
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-100300332023-03-22 Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem Zhang, Jin Liu, Qing Han, XiaoHang PLoS One Research Article In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale. Public Library of Science 2023-03-21 /pmc/articles/PMC10030033/ /pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207 Text en © 2023 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zhang, Jin Liu, Qing Han, XiaoHang Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title	Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_full	Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_fullStr	Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_full_unstemmed	Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_short	Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem
title_sort	dynamic sub-route-based self-adaptive beam search q-learning algorithm for traveling salesman problem
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030033/ https://www.ncbi.nlm.nih.gov/pubmed/36943840 http://dx.doi.org/10.1371/journal.pone.0283207
work_keys_str_mv	AT zhangjin dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem AT liuqing dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem AT hanxiaohang dynamicsubroutebasedselfadaptivebeamsearchqlearningalgorithmfortravelingsalesmanproblem

Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

Ejemplares similares