Cargando…

Root-aligned SMILES: a tight representation for chemical reaction prediction

Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule rep...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhong, Zipeng, Song, Jie, Feng, Zunlei, Liu, Tiantao, Jia, Lingxiang, Yao, Shaolun, Wu, Min, Hou, Tingjun, Song, Mingli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9365080/
https://www.ncbi.nlm.nih.gov/pubmed/36091202
http://dx.doi.org/10.1039/d2sc02763a
_version_ 1784765269637332992
author Zhong, Zipeng
Song, Jie
Feng, Zunlei
Liu, Tiantao
Jia, Lingxiang
Yao, Shaolun
Wu, Min
Hou, Tingjun
Song, Mingli
author_facet Zhong, Zipeng
Song, Jie
Feng, Zunlei
Liu, Tiantao
Jia, Lingxiang
Yao, Shaolun
Wu, Min
Hou, Tingjun
Song, Mingli
author_sort Zhong, Zipeng
collection PubMed
description Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.
format Online
Article
Text
id pubmed-9365080
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-93650802022-09-08 Root-aligned SMILES: a tight representation for chemical reaction prediction Zhong, Zipeng Song, Jie Feng, Zunlei Liu, Tiantao Jia, Lingxiang Yao, Shaolun Wu, Min Hou, Tingjun Song, Mingli Chem Sci Chemistry Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method. The Royal Society of Chemistry 2022-07-12 /pmc/articles/PMC9365080/ /pubmed/36091202 http://dx.doi.org/10.1039/d2sc02763a Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
Zhong, Zipeng
Song, Jie
Feng, Zunlei
Liu, Tiantao
Jia, Lingxiang
Yao, Shaolun
Wu, Min
Hou, Tingjun
Song, Mingli
Root-aligned SMILES: a tight representation for chemical reaction prediction
title Root-aligned SMILES: a tight representation for chemical reaction prediction
title_full Root-aligned SMILES: a tight representation for chemical reaction prediction
title_fullStr Root-aligned SMILES: a tight representation for chemical reaction prediction
title_full_unstemmed Root-aligned SMILES: a tight representation for chemical reaction prediction
title_short Root-aligned SMILES: a tight representation for chemical reaction prediction
title_sort root-aligned smiles: a tight representation for chemical reaction prediction
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9365080/
https://www.ncbi.nlm.nih.gov/pubmed/36091202
http://dx.doi.org/10.1039/d2sc02763a
work_keys_str_mv AT zhongzipeng rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT songjie rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT fengzunlei rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT liutiantao rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT jialingxiang rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT yaoshaolun rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT wumin rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT houtingjun rootalignedsmilesatightrepresentationforchemicalreactionprediction
AT songmingli rootalignedsmilesatightrepresentationforchemicalreactionprediction