Cargando…

Reformulating Reactivity Design for Data-Efficient Machine Learning

[Image: see text] Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or exp...

Descripción completa

Detalles Bibliográficos
Autores principales: Lewis-Atwell, Toby, Beechey, Daniel, Şimşek, Özgür, Grayson, Matthew N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10594582/
https://www.ncbi.nlm.nih.gov/pubmed/37881791
http://dx.doi.org/10.1021/acscatal.3c02513
_version_ 1785124683537973248
author Lewis-Atwell, Toby
Beechey, Daniel
Şimşek, Özgür
Grayson, Matthew N.
author_facet Lewis-Atwell, Toby
Beechey, Daniel
Şimşek, Özgür
Grayson, Matthew N.
author_sort Lewis-Atwell, Toby
collection PubMed
description [Image: see text] Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and S(N)2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol(–1) of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
format Online
Article
Text
id pubmed-10594582
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-105945822023-10-25 Reformulating Reactivity Design for Data-Efficient Machine Learning Lewis-Atwell, Toby Beechey, Daniel Şimşek, Özgür Grayson, Matthew N. ACS Catal [Image: see text] Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and S(N)2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol(–1) of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance. American Chemical Society 2023-10-06 /pmc/articles/PMC10594582/ /pubmed/37881791 http://dx.doi.org/10.1021/acscatal.3c02513 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Lewis-Atwell, Toby
Beechey, Daniel
Şimşek, Özgür
Grayson, Matthew N.
Reformulating Reactivity Design for Data-Efficient Machine Learning
title Reformulating Reactivity Design for Data-Efficient Machine Learning
title_full Reformulating Reactivity Design for Data-Efficient Machine Learning
title_fullStr Reformulating Reactivity Design for Data-Efficient Machine Learning
title_full_unstemmed Reformulating Reactivity Design for Data-Efficient Machine Learning
title_short Reformulating Reactivity Design for Data-Efficient Machine Learning
title_sort reformulating reactivity design for data-efficient machine learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10594582/
https://www.ncbi.nlm.nih.gov/pubmed/37881791
http://dx.doi.org/10.1021/acscatal.3c02513
work_keys_str_mv AT lewisatwelltoby reformulatingreactivitydesignfordataefficientmachinelearning
AT beecheydaniel reformulatingreactivitydesignfordataefficientmachinelearning
AT simsekozgur reformulatingreactivitydesignfordataefficientmachinelearning
AT graysonmatthewn reformulatingreactivitydesignfordataefficientmachinelearning