Cargando…

Bayesian molecular design with a chemical language model

The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of p...

Descripción completa

Detalles Bibliográficos
Autores principales: Ikebata, Hisaki, Hongo, Kenta, Isomura, Tetsu, Maezono, Ryo, Yoshida, Ryo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393296/
https://www.ncbi.nlm.nih.gov/pubmed/28281211
http://dx.doi.org/10.1007/s10822-016-0008-z
_version_ 1783229561831424000
author Ikebata, Hisaki
Hongo, Kenta
Isomura, Tetsu
Maezono, Ryo
Yoshida, Ryo
author_facet Ikebata, Hisaki
Hongo, Kenta
Isomura, Tetsu
Maezono, Ryo
Yoshida, Ryo
author_sort Ikebata, Hisaki
collection PubMed
description The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained forward models through Bayes’ law, we derive a posterior distribution for the backward prediction, which is conditioned by a desired property requirement. Exploring high-probability regions of the posterior with a sequential Monte Carlo technique, molecules that exhibit the desired properties can computationally be created. One major difficulty in the computational creation of molecules is the exclusion of the occurrence of chemically unfavorable structures. To circumvent this issue, we derive a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation. In the backward prediction, the trained language model is used to refine chemical strings such that the properties of the resulting structures fall within the desired property region while chemically unfavorable structures are successfully removed. The present method is demonstrated through the design of small organic molecules with the property requirements on HOMO-LUMO gap and internal energy. The R package iqspr is available at the CRAN repository. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10822-016-0008-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5393296
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-53932962017-05-02 Bayesian molecular design with a chemical language model Ikebata, Hisaki Hongo, Kenta Isomura, Tetsu Maezono, Ryo Yoshida, Ryo J Comput Aided Mol Des Article The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained forward models through Bayes’ law, we derive a posterior distribution for the backward prediction, which is conditioned by a desired property requirement. Exploring high-probability regions of the posterior with a sequential Monte Carlo technique, molecules that exhibit the desired properties can computationally be created. One major difficulty in the computational creation of molecules is the exclusion of the occurrence of chemically unfavorable structures. To circumvent this issue, we derive a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation. In the backward prediction, the trained language model is used to refine chemical strings such that the properties of the resulting structures fall within the desired property region while chemically unfavorable structures are successfully removed. The present method is demonstrated through the design of small organic molecules with the property requirements on HOMO-LUMO gap and internal energy. The R package iqspr is available at the CRAN repository. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10822-016-0008-z) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-03-09 2017 /pmc/articles/PMC5393296/ /pubmed/28281211 http://dx.doi.org/10.1007/s10822-016-0008-z Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Article
Ikebata, Hisaki
Hongo, Kenta
Isomura, Tetsu
Maezono, Ryo
Yoshida, Ryo
Bayesian molecular design with a chemical language model
title Bayesian molecular design with a chemical language model
title_full Bayesian molecular design with a chemical language model
title_fullStr Bayesian molecular design with a chemical language model
title_full_unstemmed Bayesian molecular design with a chemical language model
title_short Bayesian molecular design with a chemical language model
title_sort bayesian molecular design with a chemical language model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393296/
https://www.ncbi.nlm.nih.gov/pubmed/28281211
http://dx.doi.org/10.1007/s10822-016-0008-z
work_keys_str_mv AT ikebatahisaki bayesianmoleculardesignwithachemicallanguagemodel
AT hongokenta bayesianmoleculardesignwithachemicallanguagemodel
AT isomuratetsu bayesianmoleculardesignwithachemicallanguagemodel
AT maezonoryo bayesianmoleculardesignwithachemicallanguagemodel
AT yoshidaryo bayesianmoleculardesignwithachemicallanguagemodel