Cargando…

Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph

With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in t...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Baiqing, Chen, Hongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8838603/
https://www.ncbi.nlm.nih.gov/pubmed/35164303
http://dx.doi.org/10.3390/molecules27031039
_version_ 1784650167394238464
author Li, Baiqing
Chen, Hongming
author_facet Li, Baiqing
Chen, Hongming
author_sort Li, Baiqing
collection PubMed
description With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in the prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the identification of the shortest reaction paths (SRP) needed to synthesize compounds, and different SRP cut-offs were then used as the threshold to distinguish a organic compound as either an easy-to-synthesize (ES) or hard-to-synthesize (HS) class. Two synthesis accessibility models (DNN-ECFP model and graph-based CMPNN model) were built using deep learning/machine learning algorithms. Compared to other existing synthesis accessibility scoring schemes, such as SYBA, SCScore, and SAScore, our results show that CMPNN (ROC AUC: 0.791) performs better than SYBA (ROC AUC: 0.76), albeit marginally, and outperforms SAScore and SCScore. Our prediction models based on historical reaction knowledge could be a potential tool for estimating molecule SA.
format Online
Article
Text
id pubmed-8838603
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88386032022-02-13 Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph Li, Baiqing Chen, Hongming Molecules Article With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in the prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the identification of the shortest reaction paths (SRP) needed to synthesize compounds, and different SRP cut-offs were then used as the threshold to distinguish a organic compound as either an easy-to-synthesize (ES) or hard-to-synthesize (HS) class. Two synthesis accessibility models (DNN-ECFP model and graph-based CMPNN model) were built using deep learning/machine learning algorithms. Compared to other existing synthesis accessibility scoring schemes, such as SYBA, SCScore, and SAScore, our results show that CMPNN (ROC AUC: 0.791) performs better than SYBA (ROC AUC: 0.76), albeit marginally, and outperforms SAScore and SCScore. Our prediction models based on historical reaction knowledge could be a potential tool for estimating molecule SA. MDPI 2022-02-03 /pmc/articles/PMC8838603/ /pubmed/35164303 http://dx.doi.org/10.3390/molecules27031039 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Baiqing
Chen, Hongming
Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph
title Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph
title_full Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph
title_fullStr Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph
title_full_unstemmed Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph
title_short Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph
title_sort prediction of compound synthesis accessibility based on reaction knowledge graph
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8838603/
https://www.ncbi.nlm.nih.gov/pubmed/35164303
http://dx.doi.org/10.3390/molecules27031039
work_keys_str_mv AT libaiqing predictionofcompoundsynthesisaccessibilitybasedonreactionknowledgegraph
AT chenhongming predictionofcompoundsynthesisaccessibilitybasedonreactionknowledgegraph