Cargando…
Quantum architecture search via truly proximal policy optimization
Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based Q...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060432/ https://www.ncbi.nlm.nih.gov/pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2 |
_version_ | 1785017095347503104 |
---|---|
author | Zhu, Xianchao Hou, Xiaokai |
author_facet | Zhu, Xianchao Hou, Xiaokai |
author_sort | Zhu, Xianchao |
collection | PubMed |
description | Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method. |
format | Online Article Text |
id | pubmed-10060432 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-100604322023-03-31 Quantum architecture search via truly proximal policy optimization Zhu, Xianchao Hou, Xiaokai Sci Rep Article Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method. Nature Publishing Group UK 2023-03-29 /pmc/articles/PMC10060432/ /pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Zhu, Xianchao Hou, Xiaokai Quantum architecture search via truly proximal policy optimization |
title | Quantum architecture search via truly proximal policy optimization |
title_full | Quantum architecture search via truly proximal policy optimization |
title_fullStr | Quantum architecture search via truly proximal policy optimization |
title_full_unstemmed | Quantum architecture search via truly proximal policy optimization |
title_short | Quantum architecture search via truly proximal policy optimization |
title_sort | quantum architecture search via truly proximal policy optimization |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060432/ https://www.ncbi.nlm.nih.gov/pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2 |
work_keys_str_mv | AT zhuxianchao quantumarchitecturesearchviatrulyproximalpolicyoptimization AT houxiaokai quantumarchitecturesearchviatrulyproximalpolicyoptimization |