Cargando…
CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction
BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440582/ https://www.ncbi.nlm.nih.gov/pubmed/36056300 http://dx.doi.org/10.1186/s12859-022-04904-7 |
_version_ | 1784782383252242432 |
---|---|
author | Yang, Feng Liu, Juan Zhang, Qiang Yang, Zhihui Zhang, Xiaolei |
author_facet | Yang, Feng Liu, Juan Zhang, Qiang Yang, Zhihui Zhang, Xiaolei |
author_sort | Yang, Feng |
collection | PubMed |
description | BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry specification (SMILES) or molecular fingerprints in order to build the prediction models. However, most of the existing models utilize only one molecular descriptor and simply consider the molecular descriptors in a whole rather than further mining multi-scale features, which cannot fully and finely utilizes molecules and molecular descriptors features. RESULTS: We propose a novel model to address the above concerns. Firstly, we build a new convolutional neural network (CNN) based feature extraction network to extract multi-scale features from the molecular descriptors by utilizing several filters with different sizes. Then, we utilize a two-branch feature extraction layer to fusion the multi-scale features of several molecular descriptors to perform the retrosynthesis prediction without expert knowledge. The comparing result with other models on the benchmark USPTO-50k chemical dataset shows that our model surpasses the state-of-the-art model by 7.4%, 10.8%, 11.7% and 12.2% in terms of the top-1, top-3, top-5 and top-10 accuracies. Since there is no related work in the field of bioretrosynthesis prediction due to the fact that compounds in metabolic reactions are much more difficult to be featured than those in chemical reactions, we further test the feasibility of our model in task of bioretrosynthesis prediction by using the well-known MetaNetX metabolic dataset, and achieve top-1, top-3, top-5 and top-10 accuracies of 45.2%, 67.0%, 73.6% and 82.2%, respectively. CONCLUSION: The comparison result on USPTO-50k indicates that our proposed model surpasses the existing state-of-the-art model. The evaluation result on MetaNetX dataset indicates that the models used for retrosynthesis prediction can also be used for bioretrosynthesis prediction. |
format | Online Article Text |
id | pubmed-9440582 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-94405822022-09-04 CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction Yang, Feng Liu, Juan Zhang, Qiang Yang, Zhihui Zhang, Xiaolei BMC Bioinformatics Research BACKGROUND: Retrosynthesis prediction is the task of deducing reactants from reaction products, which is of great importance for designing the synthesis routes of the target products. The product molecules are generally represented with some descriptors such as simplified molecular input line entry specification (SMILES) or molecular fingerprints in order to build the prediction models. However, most of the existing models utilize only one molecular descriptor and simply consider the molecular descriptors in a whole rather than further mining multi-scale features, which cannot fully and finely utilizes molecules and molecular descriptors features. RESULTS: We propose a novel model to address the above concerns. Firstly, we build a new convolutional neural network (CNN) based feature extraction network to extract multi-scale features from the molecular descriptors by utilizing several filters with different sizes. Then, we utilize a two-branch feature extraction layer to fusion the multi-scale features of several molecular descriptors to perform the retrosynthesis prediction without expert knowledge. The comparing result with other models on the benchmark USPTO-50k chemical dataset shows that our model surpasses the state-of-the-art model by 7.4%, 10.8%, 11.7% and 12.2% in terms of the top-1, top-3, top-5 and top-10 accuracies. Since there is no related work in the field of bioretrosynthesis prediction due to the fact that compounds in metabolic reactions are much more difficult to be featured than those in chemical reactions, we further test the feasibility of our model in task of bioretrosynthesis prediction by using the well-known MetaNetX metabolic dataset, and achieve top-1, top-3, top-5 and top-10 accuracies of 45.2%, 67.0%, 73.6% and 82.2%, respectively. CONCLUSION: The comparison result on USPTO-50k indicates that our proposed model surpasses the existing state-of-the-art model. The evaluation result on MetaNetX dataset indicates that the models used for retrosynthesis prediction can also be used for bioretrosynthesis prediction. BioMed Central 2022-09-02 /pmc/articles/PMC9440582/ /pubmed/36056300 http://dx.doi.org/10.1186/s12859-022-04904-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Yang, Feng Liu, Juan Zhang, Qiang Yang, Zhihui Zhang, Xiaolei CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
title | CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
title_full | CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
title_fullStr | CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
title_full_unstemmed | CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
title_short | CNN-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
title_sort | cnn-based two-branch multi-scale feature extraction network for retrosynthesis prediction |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440582/ https://www.ncbi.nlm.nih.gov/pubmed/36056300 http://dx.doi.org/10.1186/s12859-022-04904-7 |
work_keys_str_mv | AT yangfeng cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction AT liujuan cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction AT zhangqiang cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction AT yangzhihui cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction AT zhangxiaolei cnnbasedtwobranchmultiscalefeatureextractionnetworkforretrosynthesisprediction |