Cargando…

SMILES-based deep generative scaffold decorator for de-novo drug design

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built mole...

Descripción completa

Detalles Bibliográficos
Autores principales: Arús-Pous, Josep, Patronov, Atanas, Bjerrum, Esben Jannik, Tyrchan, Christian, Reymond, Jean-Louis, Chen, Hongming, Engkvist, Ola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7260788/
https://www.ncbi.nlm.nih.gov/pubmed/33431013
http://dx.doi.org/10.1186/s13321-020-00441-8
_version_ 1783540392057110528
author Arús-Pous, Josep
Patronov, Atanas
Bjerrum, Esben Jannik
Tyrchan, Christian
Reymond, Jean-Louis
Chen, Hongming
Engkvist, Ola
author_facet Arús-Pous, Josep
Patronov, Atanas
Bjerrum, Esben Jannik
Tyrchan, Christian
Reymond, Jean-Louis
Chen, Hongming
Engkvist, Ola
author_sort Arús-Pous, Josep
collection PubMed
description Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation. [Image: see text]
format Online
Article
Text
id pubmed-7260788
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-72607882020-06-07 SMILES-based deep generative scaffold decorator for de-novo drug design Arús-Pous, Josep Patronov, Atanas Bjerrum, Esben Jannik Tyrchan, Christian Reymond, Jean-Louis Chen, Hongming Engkvist, Ola J Cheminform Research Article Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation. [Image: see text] Springer International Publishing 2020-05-29 /pmc/articles/PMC7260788/ /pubmed/33431013 http://dx.doi.org/10.1186/s13321-020-00441-8 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Arús-Pous, Josep
Patronov, Atanas
Bjerrum, Esben Jannik
Tyrchan, Christian
Reymond, Jean-Louis
Chen, Hongming
Engkvist, Ola
SMILES-based deep generative scaffold decorator for de-novo drug design
title SMILES-based deep generative scaffold decorator for de-novo drug design
title_full SMILES-based deep generative scaffold decorator for de-novo drug design
title_fullStr SMILES-based deep generative scaffold decorator for de-novo drug design
title_full_unstemmed SMILES-based deep generative scaffold decorator for de-novo drug design
title_short SMILES-based deep generative scaffold decorator for de-novo drug design
title_sort smiles-based deep generative scaffold decorator for de-novo drug design
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7260788/
https://www.ncbi.nlm.nih.gov/pubmed/33431013
http://dx.doi.org/10.1186/s13321-020-00441-8
work_keys_str_mv AT aruspousjosep smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT patronovatanas smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT bjerrumesbenjannik smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT tyrchanchristian smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT reymondjeanlouis smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT chenhongming smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign
AT engkvistola smilesbaseddeepgenerativescaffolddecoratorfordenovodrugdesign