Cargando…
Reformulating Reactivity Design for Data-Efficient Machine Learning
[Image: see text] Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or exp...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10594582/ https://www.ncbi.nlm.nih.gov/pubmed/37881791 http://dx.doi.org/10.1021/acscatal.3c02513 |
_version_ | 1785124683537973248 |
---|---|
author | Lewis-Atwell, Toby Beechey, Daniel Şimşek, Özgür Grayson, Matthew N. |
author_facet | Lewis-Atwell, Toby Beechey, Daniel Şimşek, Özgür Grayson, Matthew N. |
author_sort | Lewis-Atwell, Toby |
collection | PubMed |
description | [Image: see text] Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and S(N)2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol(–1) of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance. |
format | Online Article Text |
id | pubmed-10594582 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-105945822023-10-25 Reformulating Reactivity Design for Data-Efficient Machine Learning Lewis-Atwell, Toby Beechey, Daniel Şimşek, Özgür Grayson, Matthew N. ACS Catal [Image: see text] Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and S(N)2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol(–1) of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance. American Chemical Society 2023-10-06 /pmc/articles/PMC10594582/ /pubmed/37881791 http://dx.doi.org/10.1021/acscatal.3c02513 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Lewis-Atwell, Toby Beechey, Daniel Şimşek, Özgür Grayson, Matthew N. Reformulating Reactivity Design for Data-Efficient Machine Learning |
title | Reformulating Reactivity
Design for Data-Efficient
Machine Learning |
title_full | Reformulating Reactivity
Design for Data-Efficient
Machine Learning |
title_fullStr | Reformulating Reactivity
Design for Data-Efficient
Machine Learning |
title_full_unstemmed | Reformulating Reactivity
Design for Data-Efficient
Machine Learning |
title_short | Reformulating Reactivity
Design for Data-Efficient
Machine Learning |
title_sort | reformulating reactivity
design for data-efficient
machine learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10594582/ https://www.ncbi.nlm.nih.gov/pubmed/37881791 http://dx.doi.org/10.1021/acscatal.3c02513 |
work_keys_str_mv | AT lewisatwelltoby reformulatingreactivitydesignfordataefficientmachinelearning AT beecheydaniel reformulatingreactivitydesignfordataefficientmachinelearning AT simsekozgur reformulatingreactivitydesignfordataefficientmachinelearning AT graysonmatthewn reformulatingreactivitydesignfordataefficientmachinelearning |