Cargando…

MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods

Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on...

Descripción completa

Detalles Bibliográficos
Autores principales: Nandi, Surajit, Vegge, Tejs, Bhowmik, Arghya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632468/
https://www.ncbi.nlm.nih.gov/pubmed/37938558
http://dx.doi.org/10.1038/s41597-023-02690-2
_version_ 1785132584046428160
author Nandi, Surajit
Vegge, Tejs
Bhowmik, Arghya
author_facet Nandi, Surajit
Vegge, Tejs
Bhowmik, Arghya
author_sort Nandi, Surajit
collection PubMed
description Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML.
format Online
Article
Text
id pubmed-10632468
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106324682023-11-10 MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods Nandi, Surajit Vegge, Tejs Bhowmik, Arghya Sci Data Data Descriptor Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML. Nature Publishing Group UK 2023-11-08 /pmc/articles/PMC10632468/ /pubmed/37938558 http://dx.doi.org/10.1038/s41597-023-02690-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Nandi, Surajit
Vegge, Tejs
Bhowmik, Arghya
MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
title MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
title_full MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
title_fullStr MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
title_full_unstemmed MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
title_short MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
title_sort multixc-qm9: large dataset of molecular and reaction energies from multi-level quantum chemical methods
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632468/
https://www.ncbi.nlm.nih.gov/pubmed/37938558
http://dx.doi.org/10.1038/s41597-023-02690-2
work_keys_str_mv AT nandisurajit multixcqm9largedatasetofmolecularandreactionenergiesfrommultilevelquantumchemicalmethods
AT veggetejs multixcqm9largedatasetofmolecularandreactionenergiesfrommultilevelquantumchemicalmethods
AT bhowmikarghya multixcqm9largedatasetofmolecularandreactionenergiesfrommultilevelquantumchemicalmethods