Cargando…

Quantum architecture search via truly proximal policy optimization

Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based Q...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Xianchao, Hou, Xiaokai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060432/
https://www.ncbi.nlm.nih.gov/pubmed/36991061
http://dx.doi.org/10.1038/s41598-023-32349-2
_version_ 1785017095347503104
author Zhu, Xianchao
Hou, Xiaokai
author_facet Zhu, Xianchao
Hou, Xiaokai
author_sort Zhu, Xianchao
collection PubMed
description Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method.
format Online
Article
Text
id pubmed-10060432
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-100604322023-03-31 Quantum architecture search via truly proximal policy optimization Zhu, Xianchao Hou, Xiaokai Sci Rep Article Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method. Nature Publishing Group UK 2023-03-29 /pmc/articles/PMC10060432/ /pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Zhu, Xianchao
Hou, Xiaokai
Quantum architecture search via truly proximal policy optimization
title Quantum architecture search via truly proximal policy optimization
title_full Quantum architecture search via truly proximal policy optimization
title_fullStr Quantum architecture search via truly proximal policy optimization
title_full_unstemmed Quantum architecture search via truly proximal policy optimization
title_short Quantum architecture search via truly proximal policy optimization
title_sort quantum architecture search via truly proximal policy optimization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060432/
https://www.ncbi.nlm.nih.gov/pubmed/36991061
http://dx.doi.org/10.1038/s41598-023-32349-2
work_keys_str_mv AT zhuxianchao quantumarchitecturesearchviatrulyproximalpolicyoptimization
AT houxiaokai quantumarchitecturesearchviatrulyproximalpolicyoptimization