Cargando…
Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds
[Image: see text] An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable x, selecting...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850731/ https://www.ncbi.nlm.nih.gov/pubmed/36687084 http://dx.doi.org/10.1021/acsomega.2c05145 |
_version_ | 1784872247063740416 |
---|---|
author | Morishita, Toshiharu Kaneko, Hiromasa |
author_facet | Morishita, Toshiharu Kaneko, Hiromasa |
author_sort | Morishita, Toshiharu |
collection | PubMed |
description | [Image: see text] An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable x, selecting initial samples with a larger D-optimality allows little correlation between x in the selected samples, which leads to effective regression model building. However, in the case of experimental designs with compounds, a high correlation always exists between molecular descriptors calculated from chemical structures, and compounds with similar structures form clusters in the chemical space. Therefore, selecting the initial samples uniformly from each cluster is desirable for obtaining initial samples with maximum information on experimental conditions. As D-optimality does not work well with highly correlated molecular descriptors and does not consider information on clusters in sample selection, we propose an initial sample selection method based on clustering and apply it to the optimization of coupling reaction conditions with BO. We confirm that the proposed method reaches the optimal solution with up to 5% fewer experiments than random sampling or sampling based on D-optimality. This study makes a contribution to the initial sample selection method for BO, and we are convinced that the proposed method improves the search performance of BO in various fields of science and technology if initial samples can be determined using cluster information appropriately formed by utilizing domain knowledge. |
format | Online Article Text |
id | pubmed-9850731 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-98507312023-01-20 Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds Morishita, Toshiharu Kaneko, Hiromasa ACS Omega [Image: see text] An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable x, selecting initial samples with a larger D-optimality allows little correlation between x in the selected samples, which leads to effective regression model building. However, in the case of experimental designs with compounds, a high correlation always exists between molecular descriptors calculated from chemical structures, and compounds with similar structures form clusters in the chemical space. Therefore, selecting the initial samples uniformly from each cluster is desirable for obtaining initial samples with maximum information on experimental conditions. As D-optimality does not work well with highly correlated molecular descriptors and does not consider information on clusters in sample selection, we propose an initial sample selection method based on clustering and apply it to the optimization of coupling reaction conditions with BO. We confirm that the proposed method reaches the optimal solution with up to 5% fewer experiments than random sampling or sampling based on D-optimality. This study makes a contribution to the initial sample selection method for BO, and we are convinced that the proposed method improves the search performance of BO in various fields of science and technology if initial samples can be determined using cluster information appropriately formed by utilizing domain knowledge. American Chemical Society 2022-12-30 /pmc/articles/PMC9850731/ /pubmed/36687084 http://dx.doi.org/10.1021/acsomega.2c05145 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Morishita, Toshiharu Kaneko, Hiromasa Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds |
title | Initial Sample
Selection in Bayesian Optimization
for Combinatorial Optimization of Chemical Compounds |
title_full | Initial Sample
Selection in Bayesian Optimization
for Combinatorial Optimization of Chemical Compounds |
title_fullStr | Initial Sample
Selection in Bayesian Optimization
for Combinatorial Optimization of Chemical Compounds |
title_full_unstemmed | Initial Sample
Selection in Bayesian Optimization
for Combinatorial Optimization of Chemical Compounds |
title_short | Initial Sample
Selection in Bayesian Optimization
for Combinatorial Optimization of Chemical Compounds |
title_sort | initial sample
selection in bayesian optimization
for combinatorial optimization of chemical compounds |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850731/ https://www.ncbi.nlm.nih.gov/pubmed/36687084 http://dx.doi.org/10.1021/acsomega.2c05145 |
work_keys_str_mv | AT morishitatoshiharu initialsampleselectioninbayesianoptimizationforcombinatorialoptimizationofchemicalcompounds AT kanekohiromasa initialsampleselectioninbayesianoptimizationforcombinatorialoptimizationofchemicalcompounds |