Cargando…

Quantum architecture search via truly proximal policy optimization

Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based Q...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhu, Xianchao, Hou, Xiaokai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060432/ https://www.ncbi.nlm.nih.gov/pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2

_version_	1785017095347503104
author	Zhu, Xianchao Hou, Xiaokai
author_facet	Zhu, Xianchao Hou, Xiaokai
author_sort	Zhu, Xianchao
collection	PubMed
description	Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method.
format	Online Article Text
id	pubmed-10060432
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-100604322023-03-31 Quantum architecture search via truly proximal policy optimization Zhu, Xianchao Hou, Xiaokai Sci Rep Article Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method. Nature Publishing Group UK 2023-03-29 /pmc/articles/PMC10060432/ /pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Zhu, Xianchao Hou, Xiaokai Quantum architecture search via truly proximal policy optimization
title	Quantum architecture search via truly proximal policy optimization
title_full	Quantum architecture search via truly proximal policy optimization
title_fullStr	Quantum architecture search via truly proximal policy optimization
title_full_unstemmed	Quantum architecture search via truly proximal policy optimization
title_short	Quantum architecture search via truly proximal policy optimization
title_sort	quantum architecture search via truly proximal policy optimization
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060432/ https://www.ncbi.nlm.nih.gov/pubmed/36991061 http://dx.doi.org/10.1038/s41598-023-32349-2
work_keys_str_mv	AT zhuxianchao quantumarchitecturesearchviatrulyproximalpolicyoptimization AT houxiaokai quantumarchitecturesearchviatrulyproximalpolicyoptimization

Quantum architecture search via truly proximal policy optimization

Ejemplares similares