Cargando…
Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering an...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society of Chemistry
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8179287/ https://www.ncbi.nlm.nih.gov/pubmed/34163985 http://dx.doi.org/10.1039/d0sc04823b |
_version_ | 1783703746056814592 |
---|---|
author | Guan, Yanfei Coley, Connor W. Wu, Haoyang Ranasinghe, Duminda Heid, Esther Struble, Thomas J. Pattanaik, Lagnajit Green, William H. Jensen, Klavs F. |
author_facet | Guan, Yanfei Coley, Connor W. Wu, Haoyang Ranasinghe, Duminda Heid, Esther Struble, Thomas J. Pattanaik, Lagnajit Green, William H. Jensen, Klavs F. |
author_sort | Guan, Yanfei |
collection | PubMed |
description | Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C–H functionalization, aromatic C–X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings. |
format | Online Article Text |
id | pubmed-8179287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | The Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-81792872021-06-22 Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors Guan, Yanfei Coley, Connor W. Wu, Haoyang Ranasinghe, Duminda Heid, Esther Struble, Thomas J. Pattanaik, Lagnajit Green, William H. Jensen, Klavs F. Chem Sci Chemistry Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C–H functionalization, aromatic C–X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings. The Royal Society of Chemistry 2020-12-22 /pmc/articles/PMC8179287/ /pubmed/34163985 http://dx.doi.org/10.1039/d0sc04823b Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/ |
spellingShingle | Chemistry Guan, Yanfei Coley, Connor W. Wu, Haoyang Ranasinghe, Duminda Heid, Esther Struble, Thomas J. Pattanaik, Lagnajit Green, William H. Jensen, Klavs F. Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
title | Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
title_full | Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
title_fullStr | Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
title_full_unstemmed | Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
title_short | Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
title_sort | regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8179287/ https://www.ncbi.nlm.nih.gov/pubmed/34163985 http://dx.doi.org/10.1039/d0sc04823b |
work_keys_str_mv | AT guanyanfei regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT coleyconnorw regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT wuhaoyang regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT ranasingheduminda regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT heidesther regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT strublethomasj regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT pattanaiklagnajit regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT greenwilliamh regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors AT jensenklavsf regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors |