Cargando…

Transmol: repurposing a language model for molecular generation

Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life scienc...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhumagambetov, Rustam, Molnár, Ferdinand, Peshkov, Vsevolod A., Fazli, Siamac
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9037129/
https://www.ncbi.nlm.nih.gov/pubmed/35479483
http://dx.doi.org/10.1039/d1ra03086h
_version_ 1784693665987297280
author Zhumagambetov, Rustam
Molnár, Ferdinand
Peshkov, Vsevolod A.
Fazli, Siamac
author_facet Zhumagambetov, Rustam
Molnár, Ferdinand
Peshkov, Vsevolod A.
Fazli, Siamac
author_sort Zhumagambetov, Rustam
collection PubMed
description Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life sciences in general and chemistry and drug design in particular. In concordance with this observation, from 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However to date, attention mechanisms have not been employed for the problem of de novo molecular generation. Here we employ a variant of transformers, an architecture recently developed for natural language processing, for this purpose. Our results indicate that the adapted Transmol model is indeed applicable for the task of generating molecular libraries and leads to statistically significant increases in some of the core metrics of the MOSES benchmark. The presented model can be tuned to either input-guided or diversity-driven generation modes by applying a standard one-seed and a novel two-seed approach, respectively. Accordingly, the one-seed approach is best suited for the targeted generation of focused libraries composed of close analogues of the seed structure, while the two-seeds approach allows us to dive deeper into under-explored regions of the chemical space by attempting to generate the molecules that resemble both seeds. To gain more insights about the scope of the one-seed approach, we devised a new validation workflow that involves the recreation of known ligands for an important biological target vitamin D receptor. To further benefit the chemical community, the Transmol algorithm has been incorporated into our cheML.io web database of ML-generated molecules as a second generation on-demand methodology.
format Online
Article
Text
id pubmed-9037129
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-90371292022-04-26 Transmol: repurposing a language model for molecular generation Zhumagambetov, Rustam Molnár, Ferdinand Peshkov, Vsevolod A. Fazli, Siamac RSC Adv Chemistry Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other domains that have also benefited; among them, life sciences in general and chemistry and drug design in particular. In concordance with this observation, from 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However to date, attention mechanisms have not been employed for the problem of de novo molecular generation. Here we employ a variant of transformers, an architecture recently developed for natural language processing, for this purpose. Our results indicate that the adapted Transmol model is indeed applicable for the task of generating molecular libraries and leads to statistically significant increases in some of the core metrics of the MOSES benchmark. The presented model can be tuned to either input-guided or diversity-driven generation modes by applying a standard one-seed and a novel two-seed approach, respectively. Accordingly, the one-seed approach is best suited for the targeted generation of focused libraries composed of close analogues of the seed structure, while the two-seeds approach allows us to dive deeper into under-explored regions of the chemical space by attempting to generate the molecules that resemble both seeds. To gain more insights about the scope of the one-seed approach, we devised a new validation workflow that involves the recreation of known ligands for an important biological target vitamin D receptor. To further benefit the chemical community, the Transmol algorithm has been incorporated into our cheML.io web database of ML-generated molecules as a second generation on-demand methodology. The Royal Society of Chemistry 2021-07-27 /pmc/articles/PMC9037129/ /pubmed/35479483 http://dx.doi.org/10.1039/d1ra03086h Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
Zhumagambetov, Rustam
Molnár, Ferdinand
Peshkov, Vsevolod A.
Fazli, Siamac
Transmol: repurposing a language model for molecular generation
title Transmol: repurposing a language model for molecular generation
title_full Transmol: repurposing a language model for molecular generation
title_fullStr Transmol: repurposing a language model for molecular generation
title_full_unstemmed Transmol: repurposing a language model for molecular generation
title_short Transmol: repurposing a language model for molecular generation
title_sort transmol: repurposing a language model for molecular generation
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9037129/
https://www.ncbi.nlm.nih.gov/pubmed/35479483
http://dx.doi.org/10.1039/d1ra03086h
work_keys_str_mv AT zhumagambetovrustam transmolrepurposingalanguagemodelformoleculargeneration
AT molnarferdinand transmolrepurposingalanguagemodelformoleculargeneration
AT peshkovvsevoloda transmolrepurposingalanguagemodelformoleculargeneration
AT fazlisiamac transmolrepurposingalanguagemodelformoleculargeneration