Cargando…

UnCorrupt SMILES: a novel approach to de novo design

Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progres...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schoenmaker, Linde, Béquignon, Olivier J. M., Jespers, Willem, van Westen, Gerard J. P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926805/ https://www.ncbi.nlm.nih.gov/pubmed/36788579 http://dx.doi.org/10.1186/s13321-023-00696-x

_version_	1784888354058272768
author	Schoenmaker, Linde Béquignon, Olivier J. M. Jespers, Willem van Westen, Gerard J. P.
author_facet	Schoenmaker, Linde Béquignon, Olivier J. M. Jespers, Willem van Westen, Gerard J. P.
author_sort	Schoenmaker, Linde
collection	PubMed
description	Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60–90% of invalid generator outputs and fixes 35–80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60–95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00696-x.
format	Online Article Text
id	pubmed-9926805
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-99268052023-02-15 UnCorrupt SMILES: a novel approach to de novo design Schoenmaker, Linde Béquignon, Olivier J. M. Jespers, Willem van Westen, Gerard J. P. J Cheminform Research Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60–90% of invalid generator outputs and fixes 35–80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60–95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00696-x. Springer International Publishing 2023-02-14 /pmc/articles/PMC9926805/ /pubmed/36788579 http://dx.doi.org/10.1186/s13321-023-00696-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Schoenmaker, Linde Béquignon, Olivier J. M. Jespers, Willem van Westen, Gerard J. P. UnCorrupt SMILES: a novel approach to de novo design
title	UnCorrupt SMILES: a novel approach to de novo design
title_full	UnCorrupt SMILES: a novel approach to de novo design
title_fullStr	UnCorrupt SMILES: a novel approach to de novo design
title_full_unstemmed	UnCorrupt SMILES: a novel approach to de novo design
title_short	UnCorrupt SMILES: a novel approach to de novo design
title_sort	uncorrupt smiles: a novel approach to de novo design
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926805/ https://www.ncbi.nlm.nih.gov/pubmed/36788579 http://dx.doi.org/10.1186/s13321-023-00696-x
work_keys_str_mv	AT schoenmakerlinde uncorruptsmilesanovelapproachtodenovodesign AT bequignonolivierjm uncorruptsmilesanovelapproachtodenovodesign AT jesperswillem uncorruptsmilesanovelapproachtodenovodesign AT vanwestengerardjp uncorruptsmilesanovelapproachtodenovodesign

UnCorrupt SMILES: a novel approach to de novo design

Ejemplares similares