Cargando…
Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782878/ https://www.ncbi.nlm.nih.gov/pubmed/35064170 http://dx.doi.org/10.1038/s41598-022-04967-9 |
_version_ | 1784638408103034880 |
---|---|
author | Nakamura, Tomohiro Sakaue, Shinsaku Fujii, Kaito Harabuchi, Yu Maeda, Satoshi Iwata, Satoru |
author_facet | Nakamura, Tomohiro Sakaue, Shinsaku Fujii, Kaito Harabuchi, Yu Maeda, Satoshi Iwata, Satoru |
author_sort | Nakamura, Tomohiro |
collection | PubMed |
description | Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in machine learning and mathematical optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. Our method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm, and the diversity of selected molecules measured by the submodular function value is mathematically guaranteed to be at least 63% of that of an optimal selection. We also introduce a new evaluation criterion to measure the diversity of selected molecules based on molecular properties. Computational experiments confirm that our SubMo-GNN successfully selects diverse molecules from the QM9 dataset regarding the property-based criterion, while performing comparably to existing methods regarding standard structure-based criteria. We also demonstrate that SubMo-GNN with a GNN trained on the QM9 dataset can select diverse molecules even from other MoleculeNet datasets whose domains are different from the QM9 dataset. The proposed method enables researchers to obtain diverse sets of molecules for discovering new molecules and novel chemical reactions, and the proposed diversity criterion is useful for discussing the diversity of molecular libraries from a new property-based perspective. |
format | Online Article Text |
id | pubmed-8782878 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-87828782022-01-25 Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks Nakamura, Tomohiro Sakaue, Shinsaku Fujii, Kaito Harabuchi, Yu Maeda, Satoshi Iwata, Satoru Sci Rep Article Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in machine learning and mathematical optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. Our method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm, and the diversity of selected molecules measured by the submodular function value is mathematically guaranteed to be at least 63% of that of an optimal selection. We also introduce a new evaluation criterion to measure the diversity of selected molecules based on molecular properties. Computational experiments confirm that our SubMo-GNN successfully selects diverse molecules from the QM9 dataset regarding the property-based criterion, while performing comparably to existing methods regarding standard structure-based criteria. We also demonstrate that SubMo-GNN with a GNN trained on the QM9 dataset can select diverse molecules even from other MoleculeNet datasets whose domains are different from the QM9 dataset. The proposed method enables researchers to obtain diverse sets of molecules for discovering new molecules and novel chemical reactions, and the proposed diversity criterion is useful for discussing the diversity of molecular libraries from a new property-based perspective. Nature Publishing Group UK 2022-01-21 /pmc/articles/PMC8782878/ /pubmed/35064170 http://dx.doi.org/10.1038/s41598-022-04967-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Nakamura, Tomohiro Sakaue, Shinsaku Fujii, Kaito Harabuchi, Yu Maeda, Satoshi Iwata, Satoru Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
title | Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
title_full | Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
title_fullStr | Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
title_full_unstemmed | Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
title_short | Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
title_sort | selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782878/ https://www.ncbi.nlm.nih.gov/pubmed/35064170 http://dx.doi.org/10.1038/s41598-022-04967-9 |
work_keys_str_mv | AT nakamuratomohiro selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks AT sakaueshinsaku selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks AT fujiikaito selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks AT harabuchiyu selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks AT maedasatoshi selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks AT iwatasatoru selectingmoleculeswithdiversestructuresandpropertiesbymaximizingsubmodularfunctionsofdescriptorslearnedwithgraphneuralnetworks |