Cargando…

Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions

We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly ou...

Descripción completa

Detalles Bibliográficos
Autores principales: Duan, Hongliang, Wang, Ling, Zhang, Chengyun, Guo, Lin, Li, Jianjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9047528/
https://www.ncbi.nlm.nih.gov/pubmed/35494683
http://dx.doi.org/10.1039/c9ra08535a
_version_ 1784695746707062784
author Duan, Hongliang
Wang, Ling
Zhang, Chengyun
Guo, Lin
Li, Jianjun
author_facet Duan, Hongliang
Wang, Ling
Zhang, Chengyun
Guo, Lin
Li, Jianjun
author_sort Duan, Hongliang
collection PubMed
description We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists select and analyze the “wrong” predictions that may be chemically plausible but differ from the ground truth. The effectiveness of our model is found to be underestimated and the “true” top-1 accuracy reaches as high as 64.6%.
format Online
Article
Text
id pubmed-9047528
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-90475282022-04-28 Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions Duan, Hongliang Wang, Ling Zhang, Chengyun Guo, Lin Li, Jianjun RSC Adv Chemistry We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists select and analyze the “wrong” predictions that may be chemically plausible but differ from the ground truth. The effectiveness of our model is found to be underestimated and the “true” top-1 accuracy reaches as high as 64.6%. The Royal Society of Chemistry 2020-01-08 /pmc/articles/PMC9047528/ /pubmed/35494683 http://dx.doi.org/10.1039/c9ra08535a Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/
spellingShingle Chemistry
Duan, Hongliang
Wang, Ling
Zhang, Chengyun
Guo, Lin
Li, Jianjun
Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
title Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
title_full Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
title_fullStr Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
title_full_unstemmed Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
title_short Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
title_sort retrosynthesis with attention-based nmt model and chemical analysis of “wrong” predictions
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9047528/
https://www.ncbi.nlm.nih.gov/pubmed/35494683
http://dx.doi.org/10.1039/c9ra08535a
work_keys_str_mv AT duanhongliang retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions
AT wangling retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions
AT zhangchengyun retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions
AT guolin retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions
AT lijianjun retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions