Cargando…
GEOM, energy-annotated molecular conformations for property prediction and molecular generation
Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9023519/ https://www.ncbi.nlm.nih.gov/pubmed/35449137 http://dx.doi.org/10.1038/s41597-022-01288-4 |
_version_ | 1784690369406959616 |
---|---|
author | Axelrod, Simon Gómez-Bombarelli, Rafael |
author_facet | Axelrod, Simon Gómez-Bombarelli, Rafael |
author_sort | Axelrod, Simon |
collection | PubMed |
description | Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations. |
format | Online Article Text |
id | pubmed-9023519 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-90235192022-04-28 GEOM, energy-annotated molecular conformations for property prediction and molecular generation Axelrod, Simon Gómez-Bombarelli, Rafael Sci Data Data Descriptor Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations. Nature Publishing Group UK 2022-04-21 /pmc/articles/PMC9023519/ /pubmed/35449137 http://dx.doi.org/10.1038/s41597-022-01288-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Axelrod, Simon Gómez-Bombarelli, Rafael GEOM, energy-annotated molecular conformations for property prediction and molecular generation |
title | GEOM, energy-annotated molecular conformations for property prediction and molecular generation |
title_full | GEOM, energy-annotated molecular conformations for property prediction and molecular generation |
title_fullStr | GEOM, energy-annotated molecular conformations for property prediction and molecular generation |
title_full_unstemmed | GEOM, energy-annotated molecular conformations for property prediction and molecular generation |
title_short | GEOM, energy-annotated molecular conformations for property prediction and molecular generation |
title_sort | geom, energy-annotated molecular conformations for property prediction and molecular generation |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9023519/ https://www.ncbi.nlm.nih.gov/pubmed/35449137 http://dx.doi.org/10.1038/s41597-022-01288-4 |
work_keys_str_mv | AT axelrodsimon geomenergyannotatedmolecularconformationsforpropertypredictionandmoleculargeneration AT gomezbombarellirafael geomenergyannotatedmolecularconformationsforpropertypredictionandmoleculargeneration |