Cargando…
Single-step retrosynthesis prediction by leveraging commonly preserved substructures
Retrosynthesis analysis is an important task in organic chemistry with numerous industrial applications. Previously, machine learning approaches employing natural language processing techniques achieved promising results in this task by first representing reactant molecules as strings and subsequent...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147675/ https://www.ncbi.nlm.nih.gov/pubmed/37117216 http://dx.doi.org/10.1038/s41467-023-37969-w |
_version_ | 1785034842048561152 |
---|---|
author | Fang, Lei Li, Junren Zhao, Ming Tan, Li Lou, Jian-Guang |
author_facet | Fang, Lei Li, Junren Zhao, Ming Tan, Li Lou, Jian-Guang |
author_sort | Fang, Lei |
collection | PubMed |
description | Retrosynthesis analysis is an important task in organic chemistry with numerous industrial applications. Previously, machine learning approaches employing natural language processing techniques achieved promising results in this task by first representing reactant molecules as strings and subsequently predicting reactant molecules using text generation or machine translation models. Chemists cannot readily derive useful insights from traditional approaches that rely largely on atom-level decoding in the string representations, because human experts tend to interpret reactions by analyzing substructures that comprise a molecule. It is well-established that some substructures are stable and remain unchanged in reactions. In this paper, we developed a substructure-level decoding model, where commonly preserved portions of product molecules were automatically extracted with a fully data-driven approach. Our model achieves improvement over previously reported models, and we demonstrate that its performance can be boosted further by enhancing the accuracy of these substructures. Analyzing substructures extracted from our machine learning model can provide human experts with additional insights to assist decision-making in retrosynthesis analysis. |
format | Online Article Text |
id | pubmed-10147675 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-101476752023-04-30 Single-step retrosynthesis prediction by leveraging commonly preserved substructures Fang, Lei Li, Junren Zhao, Ming Tan, Li Lou, Jian-Guang Nat Commun Article Retrosynthesis analysis is an important task in organic chemistry with numerous industrial applications. Previously, machine learning approaches employing natural language processing techniques achieved promising results in this task by first representing reactant molecules as strings and subsequently predicting reactant molecules using text generation or machine translation models. Chemists cannot readily derive useful insights from traditional approaches that rely largely on atom-level decoding in the string representations, because human experts tend to interpret reactions by analyzing substructures that comprise a molecule. It is well-established that some substructures are stable and remain unchanged in reactions. In this paper, we developed a substructure-level decoding model, where commonly preserved portions of product molecules were automatically extracted with a fully data-driven approach. Our model achieves improvement over previously reported models, and we demonstrate that its performance can be boosted further by enhancing the accuracy of these substructures. Analyzing substructures extracted from our machine learning model can provide human experts with additional insights to assist decision-making in retrosynthesis analysis. Nature Publishing Group UK 2023-04-28 /pmc/articles/PMC10147675/ /pubmed/37117216 http://dx.doi.org/10.1038/s41467-023-37969-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Fang, Lei Li, Junren Zhao, Ming Tan, Li Lou, Jian-Guang Single-step retrosynthesis prediction by leveraging commonly preserved substructures |
title | Single-step retrosynthesis prediction by leveraging commonly preserved substructures |
title_full | Single-step retrosynthesis prediction by leveraging commonly preserved substructures |
title_fullStr | Single-step retrosynthesis prediction by leveraging commonly preserved substructures |
title_full_unstemmed | Single-step retrosynthesis prediction by leveraging commonly preserved substructures |
title_short | Single-step retrosynthesis prediction by leveraging commonly preserved substructures |
title_sort | single-step retrosynthesis prediction by leveraging commonly preserved substructures |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147675/ https://www.ncbi.nlm.nih.gov/pubmed/37117216 http://dx.doi.org/10.1038/s41467-023-37969-w |
work_keys_str_mv | AT fanglei singlestepretrosynthesispredictionbyleveragingcommonlypreservedsubstructures AT lijunren singlestepretrosynthesispredictionbyleveragingcommonlypreservedsubstructures AT zhaoming singlestepretrosynthesispredictionbyleveragingcommonlypreservedsubstructures AT tanli singlestepretrosynthesispredictionbyleveragingcommonlypreservedsubstructures AT loujianguang singlestepretrosynthesispredictionbyleveragingcommonlypreservedsubstructures |