Cargando…
MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632468/ https://www.ncbi.nlm.nih.gov/pubmed/37938558 http://dx.doi.org/10.1038/s41597-023-02690-2 |
_version_ | 1785132584046428160 |
---|---|
author | Nandi, Surajit Vegge, Tejs Bhowmik, Arghya |
author_facet | Nandi, Surajit Vegge, Tejs Bhowmik, Arghya |
author_sort | Nandi, Surajit |
collection | PubMed |
description | Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML. |
format | Online Article Text |
id | pubmed-10632468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-106324682023-11-10 MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods Nandi, Surajit Vegge, Tejs Bhowmik, Arghya Sci Data Data Descriptor Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML. Nature Publishing Group UK 2023-11-08 /pmc/articles/PMC10632468/ /pubmed/37938558 http://dx.doi.org/10.1038/s41597-023-02690-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Nandi, Surajit Vegge, Tejs Bhowmik, Arghya MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods |
title | MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods |
title_full | MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods |
title_fullStr | MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods |
title_full_unstemmed | MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods |
title_short | MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods |
title_sort | multixc-qm9: large dataset of molecular and reaction energies from multi-level quantum chemical methods |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632468/ https://www.ncbi.nlm.nih.gov/pubmed/37938558 http://dx.doi.org/10.1038/s41597-023-02690-2 |
work_keys_str_mv | AT nandisurajit multixcqm9largedatasetofmolecularandreactionenergiesfrommultilevelquantumchemicalmethods AT veggetejs multixcqm9largedatasetofmolecularandreactionenergiesfrommultilevelquantumchemicalmethods AT bhowmikarghya multixcqm9largedatasetofmolecularandreactionenergiesfrommultilevelquantumchemicalmethods |