Cargando…

Geometry-Complete Diffusion for 3D Molecule Generation and Optimization

Denoising diffusion probabilistic models (DDPMs) have recently taken the field of generative modeling by storm, pioneering new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Morehead, Alex, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cornell University 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9934735/
https://www.ncbi.nlm.nih.gov/pubmed/36798459
_version_ 1784889936489480192
author Morehead, Alex
Cheng, Jianlin
author_facet Morehead, Alex
Cheng, Jianlin
author_sort Morehead, Alex
collection PubMed
description Denoising diffusion probabilistic models (DDPMs) have recently taken the field of generative modeling by storm, pioneering new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. However, such methods are unable to learn important geometric and physical properties of 3D molecules during molecular graph generation, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which negatively impacts their ability to effectively scale to datasets of large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset as well as for the larger GEOM-Drugs dataset. Importantly, we demonstrate that the geometry-complete denoising process GCDM learns for 3D molecule generation allows the model to generate realistic and stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that GCDM’s geometric features can effectively be repurposed to directly optimize the geometry and chemical composition of existing 3D molecules for specific molecular properties, demonstrating new, real-world versatility of molecular diffusion models. Our source code, data, and reproducibility instructions are freely available at https://github.com/BioinfoMachineLearning/bio-diffusion.
format Online
Article
Text
id pubmed-9934735
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cornell University
record_format MEDLINE/PubMed
spelling pubmed-99347352023-02-17 Geometry-Complete Diffusion for 3D Molecule Generation and Optimization Morehead, Alex Cheng, Jianlin ArXiv Article Denoising diffusion probabilistic models (DDPMs) have recently taken the field of generative modeling by storm, pioneering new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. However, such methods are unable to learn important geometric and physical properties of 3D molecules during molecular graph generation, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which negatively impacts their ability to effectively scale to datasets of large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset as well as for the larger GEOM-Drugs dataset. Importantly, we demonstrate that the geometry-complete denoising process GCDM learns for 3D molecule generation allows the model to generate realistic and stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that GCDM’s geometric features can effectively be repurposed to directly optimize the geometry and chemical composition of existing 3D molecules for specific molecular properties, demonstrating new, real-world versatility of molecular diffusion models. Our source code, data, and reproducibility instructions are freely available at https://github.com/BioinfoMachineLearning/bio-diffusion. Cornell University 2023-06-17 /pmc/articles/PMC9934735/ /pubmed/36798459 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Morehead, Alex
Cheng, Jianlin
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
title Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
title_full Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
title_fullStr Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
title_full_unstemmed Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
title_short Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
title_sort geometry-complete diffusion for 3d molecule generation and optimization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9934735/
https://www.ncbi.nlm.nih.gov/pubmed/36798459
work_keys_str_mv AT moreheadalex geometrycompletediffusionfor3dmoleculegenerationandoptimization
AT chengjianlin geometrycompletediffusionfor3dmoleculegenerationandoptimization