Cargando…

CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction

BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Feng, Liu, Juan, Zhang, Qiang, Yang, Zhihui, Zhang, Xiaolei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440582/
https://www.ncbi.nlm.nih.gov/pubmed/36056300
http://dx.doi.org/10.1186/s12859-022-04904-7
_version_ 1784782383252242432
author Yang, Feng
Liu, Juan
Zhang, Qiang
Yang, Zhihui
Zhang, Xiaolei
author_facet Yang, Feng
Liu, Juan
Zhang, Qiang
Yang, Zhihui
Zhang, Xiaolei
author_sort Yang, Feng
collection PubMed
description BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry specification (SMILES) or molecular fingerprints in order to build the prediction models. However, most of the existing models utilize only one molecular descriptor and simply consider the molecular descriptors in a whole rather than further mining multi-scale features, which cannot fully and finely utilizes molecules and molecular descriptors features. RESULTS: We propose a novel model to address the above concerns. Firstly, we build a new convolutional neural network (CNN) based feature extraction network to extract multi-scale features from the molecular descriptors by utilizing several filters with different sizes. Then, we utilize a two-branch feature extraction layer to fusion the multi-scale features of several molecular descriptors to perform the retrosynthesis prediction without expert knowledge. The comparing result with other models on the benchmark USPTO-50k chemical dataset shows that our model surpasses the state-of-the-art model by 7.4%, 10.8%, 11.7% and 12.2% in terms of the top-1, top-3, top-5 and top-10 accuracies. Since there is no related work in the field of bioretrosynthesis prediction due to the fact that compounds in metabolic reactions are much more difficult to be featured than those in chemical reactions, we further test the feasibility of our model in task of bioretrosynthesis prediction by using the well-known MetaNetX metabolic dataset, and achieve top-1, top-3, top-5 and top-10 accuracies of 45.2%, 67.0%, 73.6% and 82.2%, respectively. CONCLUSION: The comparison result on USPTO-50k indicates that our proposed model surpasses the existing state-of-the-art model. The evaluation result on MetaNetX dataset indicates that the models used for retrosynthesis prediction can also be used for bioretrosynthesis prediction.
format Online
Article
Text
id pubmed-9440582
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94405822022-09-04 CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction Yang, Feng Liu, Juan Zhang, Qiang Yang, Zhihui Zhang, Xiaolei BMC Bioinformatics Research BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry specification (SMILES) or molecular fingerprints in order to build the prediction models. However, most of the existing models utilize only one molecular descriptor and simply consider the molecular descriptors in a whole rather than further mining multi-scale features, which cannot fully and finely utilizes molecules and molecular descriptors features. RESULTS: We propose a novel model to address the above concerns. Firstly, we build a new convolutional neural network (CNN) based feature extraction network to extract multi-scale features from the molecular descriptors by utilizing several filters with different sizes. Then, we utilize a two-branch feature extraction layer to fusion the multi-scale features of several molecular descriptors to perform the retrosynthesis prediction without expert knowledge. The comparing result with other models on the benchmark USPTO-50k chemical dataset shows that our model surpasses the state-of-the-art model by 7.4%, 10.8%, 11.7% and 12.2% in terms of the top-1, top-3, top-5 and top-10 accuracies. Since there is no related work in the field of bioretrosynthesis prediction due to the fact that compounds in metabolic reactions are much more difficult to be featured than those in chemical reactions, we further test the feasibility of our model in task of bioretrosynthesis prediction by using the well-known MetaNetX metabolic dataset, and achieve top-1, top-3, top-5 and top-10 accuracies of 45.2%, 67.0%, 73.6% and 82.2%, respectively. CONCLUSION: The comparison result on USPTO-50k indicates that our proposed model surpasses the existing state-of-the-art model. The evaluation result on MetaNetX dataset indicates that the models used for retrosynthesis prediction can also be used for bioretrosynthesis prediction. BioMed Central 2022-09-02 /pmc/articles/PMC9440582/ /pubmed/36056300 http://dx.doi.org/10.1186/s12859-022-04904-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yang, Feng
Liu, Juan
Zhang, Qiang
Yang, Zhihui
Zhang, Xiaolei
CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
title CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
title_full CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
title_fullStr CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
title_full_unstemmed CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
title_short CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
title_sort cnn-based two-branch multi-scale feature extraction network for retrosynthesis prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440582/
https://www.ncbi.nlm.nih.gov/pubmed/36056300
http://dx.doi.org/10.1186/s12859-022-04904-7
work_keys_str_mv AT yangfeng cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction
AT liujuan cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction
AT zhangqiang cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction
AT yangzhihui cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction
AT zhangxiaolei cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction