Cargando…

Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers

Decision making using photonic technologies has been intensively researched for solving the multi-armed bandit problem, which is fundamental to reinforcement learning. However, these technologies are yet to be extended to large-scale multi-armed bandit problems. In this study, we conduct a numerical...

Descripción completa

Detalles Bibliográficos
Autores principales: Morijiri, Kensei, Mihana, Takatomo, Kanno, Kazutaka, Naruse, Makoto, Uchida, Atsushi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9110346/
https://www.ncbi.nlm.nih.gov/pubmed/35577847
http://dx.doi.org/10.1038/s41598-022-12155-y
Descripción
Sumario:Decision making using photonic technologies has been intensively researched for solving the multi-armed bandit problem, which is fundamental to reinforcement learning. However, these technologies are yet to be extended to large-scale multi-armed bandit problems. In this study, we conduct a numerical investigation of decision making to solve large-scale multi-armed bandit problems by controlling the biases of chaotic temporal waveforms generated in semiconductor lasers with optical feedback. We generate chaotic temporal waveforms using the semiconductor lasers, and each waveform is assigned to a slot machine (or choice) in the multi-armed bandit problem. The biases in the amplitudes of the chaotic waveforms are adjusted based on rewards using the tug-of-war method. Subsequently, the slot machine that yields the maximum-amplitude chaotic temporal waveform with bias is selected. The scaling properties of the correct decision-making process are examined by increasing the number of slot machines to 1024, and the scaling exponent of the power-law distribution is 0.97. We demonstrate that the proposed method outperforms existing software algorithms in terms of the scaling exponent. This result paves the way for photonic decision making in large-scale multi-armed bandit problems using photonic accelerators.