Cargando…

Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning

Infusing “chemical wisdom” should improve the data-driven approaches that rely exclusively on historical synthetic data for automatic retrosynthesis planning. For this purpose, we designed a chemistry-informed molecular graph (CIMG) to describe chemical reactions. A collection of key information tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Baicheng, Zhang, Xiaolong, Du, Wenjie, Song, Zhaokun, Zhang, Guozhen, Zhang, Guoqing, Wang, Yang, Chen, Xin, Jiang, Jun, Luo, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9564830/
https://www.ncbi.nlm.nih.gov/pubmed/36191228
http://dx.doi.org/10.1073/pnas.2212711119
_version_ 1784808743068762112
author Zhang, Baicheng
Zhang, Xiaolong
Du, Wenjie
Song, Zhaokun
Zhang, Guozhen
Zhang, Guoqing
Wang, Yang
Chen, Xin
Jiang, Jun
Luo, Yi
author_facet Zhang, Baicheng
Zhang, Xiaolong
Du, Wenjie
Song, Zhaokun
Zhang, Guozhen
Zhang, Guoqing
Wang, Yang
Chen, Xin
Jiang, Jun
Luo, Yi
author_sort Zhang, Baicheng
collection PubMed
description Infusing “chemical wisdom” should improve the data-driven approaches that rely exclusively on historical synthetic data for automatic retrosynthesis planning. For this purpose, we designed a chemistry-informed molecular graph (CIMG) to describe chemical reactions. A collection of key information that is most relevant to chemical reactions is integrated in CIMG:NMR chemical shifts as vertex features, bond dissociation energies as edge features, and solvent/catalyst information as global features. For any given compound as a target, a product CIMG is generated and exploited by a graph neural network (GNN) model to choose reaction template(s) leading to this product. A reactant CIMG is then inferred and used in two GNN models to select appropriate catalyst and solvent, respectively. Finally, a fourth GNN model compares the two CIMG descriptors to check the plausibility of the proposed reaction. A reaction vector is obtained for every molecule in training these models. The chemical wisdom of reaction propensity contained in the pretrained reaction vectors is exploited to autocategorize molecules/reactions and to accelerate Monte Carlo tree search (MCTS) for multistep retrosynthesis planning. Full synthetic routes with recommended catalysts/solvents are predicted efficiently using this CIMG-based approach.
format Online
Article
Text
id pubmed-9564830
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-95648302023-04-03 Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning Zhang, Baicheng Zhang, Xiaolong Du, Wenjie Song, Zhaokun Zhang, Guozhen Zhang, Guoqing Wang, Yang Chen, Xin Jiang, Jun Luo, Yi Proc Natl Acad Sci U S A Physical Sciences Infusing “chemical wisdom” should improve the data-driven approaches that rely exclusively on historical synthetic data for automatic retrosynthesis planning. For this purpose, we designed a chemistry-informed molecular graph (CIMG) to describe chemical reactions. A collection of key information that is most relevant to chemical reactions is integrated in CIMG:NMR chemical shifts as vertex features, bond dissociation energies as edge features, and solvent/catalyst information as global features. For any given compound as a target, a product CIMG is generated and exploited by a graph neural network (GNN) model to choose reaction template(s) leading to this product. A reactant CIMG is then inferred and used in two GNN models to select appropriate catalyst and solvent, respectively. Finally, a fourth GNN model compares the two CIMG descriptors to check the plausibility of the proposed reaction. A reaction vector is obtained for every molecule in training these models. The chemical wisdom of reaction propensity contained in the pretrained reaction vectors is exploited to autocategorize molecules/reactions and to accelerate Monte Carlo tree search (MCTS) for multistep retrosynthesis planning. Full synthetic routes with recommended catalysts/solvents are predicted efficiently using this CIMG-based approach. National Academy of Sciences 2022-10-03 2022-10-11 /pmc/articles/PMC9564830/ /pubmed/36191228 http://dx.doi.org/10.1073/pnas.2212711119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Zhang, Baicheng
Zhang, Xiaolong
Du, Wenjie
Song, Zhaokun
Zhang, Guozhen
Zhang, Guoqing
Wang, Yang
Chen, Xin
Jiang, Jun
Luo, Yi
Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
title Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
title_full Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
title_fullStr Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
title_full_unstemmed Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
title_short Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
title_sort chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9564830/
https://www.ncbi.nlm.nih.gov/pubmed/36191228
http://dx.doi.org/10.1073/pnas.2212711119
work_keys_str_mv AT zhangbaicheng chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT zhangxiaolong chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT duwenjie chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT songzhaokun chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT zhangguozhen chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT zhangguoqing chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT wangyang chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT chenxin chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT jiangjun chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning
AT luoyi chemistryinformedmoleculargraphasreactiondescriptorformachinelearnedretrosynthesisplanning