Cargando…

Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks

Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in...

Descripción completa

Detalles Bibliográficos
Autores principales: Nakamura, Tomohiro, Sakaue, Shinsaku, Fujii, Kaito, Harabuchi, Yu, Maeda, Satoshi, Iwata, Satoru
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782878/
https://www.ncbi.nlm.nih.gov/pubmed/35064170
http://dx.doi.org/10.1038/s41598-022-04967-9
_version_ 1784638408103034880
author Nakamura, Tomohiro
Sakaue, Shinsaku
Fujii, Kaito
Harabuchi, Yu
Maeda, Satoshi
Iwata, Satoru
author_facet Nakamura, Tomohiro
Sakaue, Shinsaku
Fujii, Kaito
Harabuchi, Yu
Maeda, Satoshi
Iwata, Satoru
author_sort Nakamura, Tomohiro
collection PubMed
description Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in machine learning and mathematical optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. Our method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm, and the diversity of selected molecules measured by the submodular function value is mathematically guaranteed to be at least 63% of that of an optimal selection. We also introduce a new evaluation criterion to measure the diversity of selected molecules based on molecular properties. Computational experiments confirm that our SubMo-GNN successfully selects diverse molecules from the QM9 dataset regarding the property-based criterion, while performing comparably to existing methods regarding standard structure-based criteria. We also demonstrate that SubMo-GNN with a GNN trained on the QM9 dataset can select diverse molecules even from other MoleculeNet datasets whose domains are different from the QM9 dataset. The proposed method enables researchers to obtain diverse sets of molecules for discovering new molecules and novel chemical reactions, and the proposed diversity criterion is useful for discussing the diversity of molecular libraries from a new property-based perspective.
format Online
Article
Text
id pubmed-8782878
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-87828782022-01-25 Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks Nakamura, Tomohiro Sakaue, Shinsaku Fujii, Kaito Harabuchi, Yu Maeda, Satoshi Iwata, Satoru Sci Rep Article Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in machine learning and mathematical optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. Our method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm, and the diversity of selected molecules measured by the submodular function value is mathematically guaranteed to be at least 63% of that of an optimal selection. We also introduce a new evaluation criterion to measure the diversity of selected molecules based on molecular properties. Computational experiments confirm that our SubMo-GNN successfully selects diverse molecules from the QM9 dataset regarding the property-based criterion, while performing comparably to existing methods regarding standard structure-based criteria. We also demonstrate that SubMo-GNN with a GNN trained on the QM9 dataset can select diverse molecules even from other MoleculeNet datasets whose domains are different from the QM9 dataset. The proposed method enables researchers to obtain diverse sets of molecules for discovering new molecules and novel chemical reactions, and the proposed diversity criterion is useful for discussing the diversity of molecular libraries from a new property-based perspective. Nature Publishing Group UK 2022-01-21 /pmc/articles/PMC8782878/ /pubmed/35064170 http://dx.doi.org/10.1038/s41598-022-04967-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Nakamura, Tomohiro
Sakaue, Shinsaku
Fujii, Kaito
Harabuchi, Yu
Maeda, Satoshi
Iwata, Satoru
Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
title Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
title_full Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
title_fullStr Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
title_full_unstemmed Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
title_short Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
title_sort selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782878/
https://www.ncbi.nlm.nih.gov/pubmed/35064170
http://dx.doi.org/10.1038/s41598-022-04967-9
work_keys_str_mv AT nakamuratomohiro selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks
AT sakaueshinsaku selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks
AT fujiikaito selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks
AT harabuchiyu selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks
AT maedasatoshi selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks
AT iwatasatoru selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks