Cargando…
Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions
We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly ou...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society of Chemistry
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9047528/ https://www.ncbi.nlm.nih.gov/pubmed/35494683 http://dx.doi.org/10.1039/c9ra08535a |
_version_ | 1784695746707062784 |
---|---|
author | Duan, Hongliang Wang, Ling Zhang, Chengyun Guo, Lin Li, Jianjun |
author_facet | Duan, Hongliang Wang, Ling Zhang, Chengyun Guo, Lin Li, Jianjun |
author_sort | Duan, Hongliang |
collection | PubMed |
description | We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists select and analyze the “wrong” predictions that may be chemically plausible but differ from the ground truth. The effectiveness of our model is found to be underestimated and the “true” top-1 accuracy reaches as high as 64.6%. |
format | Online Article Text |
id | pubmed-9047528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | The Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-90475282022-04-28 Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions Duan, Hongliang Wang, Ling Zhang, Chengyun Guo, Lin Li, Jianjun RSC Adv Chemistry We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists select and analyze the “wrong” predictions that may be chemically plausible but differ from the ground truth. The effectiveness of our model is found to be underestimated and the “true” top-1 accuracy reaches as high as 64.6%. The Royal Society of Chemistry 2020-01-08 /pmc/articles/PMC9047528/ /pubmed/35494683 http://dx.doi.org/10.1039/c9ra08535a Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/ |
spellingShingle | Chemistry Duan, Hongliang Wang, Ling Zhang, Chengyun Guo, Lin Li, Jianjun Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions |
title | Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions |
title_full | Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions |
title_fullStr | Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions |
title_full_unstemmed | Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions |
title_short | Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions |
title_sort | retrosynthesis with attention-based nmt model and chemical analysis of “wrong” predictions |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9047528/ https://www.ncbi.nlm.nih.gov/pubmed/35494683 http://dx.doi.org/10.1039/c9ra08535a |
work_keys_str_mv | AT duanhongliang retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions AT wangling retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions AT zhangchengyun retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions AT guolin retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions AT lijianjun retrosynthesiswithattentionbasednmtmodelandchemicalanalysisofwrongpredictions |