Cargando…

Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds

[Image: see text] An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable x, selecting...

Descripción completa

Detalles Bibliográficos
Autores principales: Morishita, Toshiharu, Kaneko, Hiromasa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850731/
https://www.ncbi.nlm.nih.gov/pubmed/36687084
http://dx.doi.org/10.1021/acsomega.2c05145
_version_ 1784872247063740416
author Morishita, Toshiharu
Kaneko, Hiromasa
author_facet Morishita, Toshiharu
Kaneko, Hiromasa
author_sort Morishita, Toshiharu
collection PubMed
description [Image: see text] An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable x, selecting initial samples with a larger D-optimality allows little correlation between x in the selected samples, which leads to effective regression model building. However, in the case of experimental designs with compounds, a high correlation always exists between molecular descriptors calculated from chemical structures, and compounds with similar structures form clusters in the chemical space. Therefore, selecting the initial samples uniformly from each cluster is desirable for obtaining initial samples with maximum information on experimental conditions. As D-optimality does not work well with highly correlated molecular descriptors and does not consider information on clusters in sample selection, we propose an initial sample selection method based on clustering and apply it to the optimization of coupling reaction conditions with BO. We confirm that the proposed method reaches the optimal solution with up to 5% fewer experiments than random sampling or sampling based on D-optimality. This study makes a contribution to the initial sample selection method for BO, and we are convinced that the proposed method improves the search performance of BO in various fields of science and technology if initial samples can be determined using cluster information appropriately formed by utilizing domain knowledge.
format Online
Article
Text
id pubmed-9850731
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-98507312023-01-20 Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds Morishita, Toshiharu Kaneko, Hiromasa ACS Omega [Image: see text] An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable x, selecting initial samples with a larger D-optimality allows little correlation between x in the selected samples, which leads to effective regression model building. However, in the case of experimental designs with compounds, a high correlation always exists between molecular descriptors calculated from chemical structures, and compounds with similar structures form clusters in the chemical space. Therefore, selecting the initial samples uniformly from each cluster is desirable for obtaining initial samples with maximum information on experimental conditions. As D-optimality does not work well with highly correlated molecular descriptors and does not consider information on clusters in sample selection, we propose an initial sample selection method based on clustering and apply it to the optimization of coupling reaction conditions with BO. We confirm that the proposed method reaches the optimal solution with up to 5% fewer experiments than random sampling or sampling based on D-optimality. This study makes a contribution to the initial sample selection method for BO, and we are convinced that the proposed method improves the search performance of BO in various fields of science and technology if initial samples can be determined using cluster information appropriately formed by utilizing domain knowledge. American Chemical Society 2022-12-30 /pmc/articles/PMC9850731/ /pubmed/36687084 http://dx.doi.org/10.1021/acsomega.2c05145 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Morishita, Toshiharu
Kaneko, Hiromasa
Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
title Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
title_full Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
title_fullStr Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
title_full_unstemmed Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
title_short Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
title_sort initial sample selection in bayesian optimization for combinatorial optimization of chemical compounds
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850731/
https://www.ncbi.nlm.nih.gov/pubmed/36687084
http://dx.doi.org/10.1021/acsomega.2c05145
work_keys_str_mv AT morishitatoshiharu initialsampleselectioninbayesianoptimizationforcombinatorialoptimizationofchemicalcompounds
AT kanekohiromasa initialsampleselectioninbayesianoptimizationforcombinatorialoptimizationofchemicalcompounds