Cargando…
WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets
Multidimensional surfaces of quantum chemical properties, such as potential energies and dipole moments, are common targets for machine learning, requiring the development of robust and diverse databases extensively exploring molecular configurational spaces. Here we composed the WS22 database cover...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931705/ https://www.ncbi.nlm.nih.gov/pubmed/36792601 http://dx.doi.org/10.1038/s41597-023-01998-3 |
_version_ | 1784889290125213696 |
---|---|
author | Pinheiro Jr, Max Zhang, Shuang Dral, Pavlo O. Barbatti, Mario |
author_facet | Pinheiro Jr, Max Zhang, Shuang Dral, Pavlo O. Barbatti, Mario |
author_sort | Pinheiro Jr, Max |
collection | PubMed |
description | Multidimensional surfaces of quantum chemical properties, such as potential energies and dipole moments, are common targets for machine learning, requiring the development of robust and diverse databases extensively exploring molecular configurational spaces. Here we composed the WS22 database covering several quantum mechanical (QM) properties (including potential energies, forces, dipole moments, polarizabilities, HOMO, and LUMO energies) for ten flexible organic molecules of increasing complexity and with up to 22 atoms. This database consists of 1.18 million equilibrium and non-equilibrium geometries carefully sampled from Wigner distributions centered at different equilibrium conformations (either at the ground or excited electronic states) and further augmented with interpolated structures. The diversity of our datasets is demonstrated by visualizing the geometries distribution with dimensionality reduction as well as via comparison of statistical features of the QM properties with those available in existing datasets. Our sampling targets broader quantum mechanical distribution of the configurational space than provided by commonly used sampling through classical molecular dynamics, upping the challenge for machine learning models. |
format | Online Article Text |
id | pubmed-9931705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99317052023-02-17 WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets Pinheiro Jr, Max Zhang, Shuang Dral, Pavlo O. Barbatti, Mario Sci Data Data Descriptor Multidimensional surfaces of quantum chemical properties, such as potential energies and dipole moments, are common targets for machine learning, requiring the development of robust and diverse databases extensively exploring molecular configurational spaces. Here we composed the WS22 database covering several quantum mechanical (QM) properties (including potential energies, forces, dipole moments, polarizabilities, HOMO, and LUMO energies) for ten flexible organic molecules of increasing complexity and with up to 22 atoms. This database consists of 1.18 million equilibrium and non-equilibrium geometries carefully sampled from Wigner distributions centered at different equilibrium conformations (either at the ground or excited electronic states) and further augmented with interpolated structures. The diversity of our datasets is demonstrated by visualizing the geometries distribution with dimensionality reduction as well as via comparison of statistical features of the QM properties with those available in existing datasets. Our sampling targets broader quantum mechanical distribution of the configurational space than provided by commonly used sampling through classical molecular dynamics, upping the challenge for machine learning models. Nature Publishing Group UK 2023-02-15 /pmc/articles/PMC9931705/ /pubmed/36792601 http://dx.doi.org/10.1038/s41597-023-01998-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Pinheiro Jr, Max Zhang, Shuang Dral, Pavlo O. Barbatti, Mario WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets |
title | WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets |
title_full | WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets |
title_fullStr | WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets |
title_full_unstemmed | WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets |
title_short | WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets |
title_sort | ws22 database, wigner sampling and geometry interpolation for configurationally diverse molecular datasets |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931705/ https://www.ncbi.nlm.nih.gov/pubmed/36792601 http://dx.doi.org/10.1038/s41597-023-01998-3 |
work_keys_str_mv | AT pinheirojrmax ws22databasewignersamplingandgeometryinterpolationforconfigurationallydiversemoleculardatasets AT zhangshuang ws22databasewignersamplingandgeometryinterpolationforconfigurationallydiversemoleculardatasets AT dralpavloo ws22databasewignersamplingandgeometryinterpolationforconfigurationallydiversemoleculardatasets AT barbattimario ws22databasewignersamplingandgeometryinterpolationforconfigurationallydiversemoleculardatasets |