Cargando…

Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies

A major goal of materials research is the discovery of novel and efficient heterogeneous catalysts for various chemical processes. In such studies, the candidate catalyst material is modeled using tens to thousands of chemical species and elementary reactions. Density Functional Theory (DFT) is wide...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharjee, Himaghna, Anesiadis, Nikolaos, Vlachos, Dionisios G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8277863/
https://www.ncbi.nlm.nih.gov/pubmed/34257362
http://dx.doi.org/10.1038/s41598-021-93854-w
_version_ 1783722144691126272
author Bhattacharjee, Himaghna
Anesiadis, Nikolaos
Vlachos, Dionisios G.
author_facet Bhattacharjee, Himaghna
Anesiadis, Nikolaos
Vlachos, Dionisios G.
author_sort Bhattacharjee, Himaghna
collection PubMed
description A major goal of materials research is the discovery of novel and efficient heterogeneous catalysts for various chemical processes. In such studies, the candidate catalyst material is modeled using tens to thousands of chemical species and elementary reactions. Density Functional Theory (DFT) is widely used to calculate the thermochemistry of these species which might be surface species or gas-phase molecules. The use of an approximate exchange correlation functional in the DFT framework introduces an important source of error in such models. This is especially true in the calculation of gas phase molecules whose thermochemistry is calculated using the same planewave basis set as the rest of the surface mechanism. Unfortunately, the nature and magnitude of these errors is unknown for most practical molecules. Here, we investigate the error in the enthalpy of formation for 1676 gaseous species using two different DFT levels of theory and the ‘ground truth values’ obtained from the NIST database. We featurize molecules using graph theory. We use a regularized algorithm to discover a sparse model of the error and identify important molecular fragments that drive this error. The model is robust to rigorous statistical tests and is used to correct DFT thermochemistry, achieving more than an order of magnitude improvement.
format Online
Article
Text
id pubmed-8277863
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82778632021-07-15 Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies Bhattacharjee, Himaghna Anesiadis, Nikolaos Vlachos, Dionisios G. Sci Rep Article A major goal of materials research is the discovery of novel and efficient heterogeneous catalysts for various chemical processes. In such studies, the candidate catalyst material is modeled using tens to thousands of chemical species and elementary reactions. Density Functional Theory (DFT) is widely used to calculate the thermochemistry of these species which might be surface species or gas-phase molecules. The use of an approximate exchange correlation functional in the DFT framework introduces an important source of error in such models. This is especially true in the calculation of gas phase molecules whose thermochemistry is calculated using the same planewave basis set as the rest of the surface mechanism. Unfortunately, the nature and magnitude of these errors is unknown for most practical molecules. Here, we investigate the error in the enthalpy of formation for 1676 gaseous species using two different DFT levels of theory and the ‘ground truth values’ obtained from the NIST database. We featurize molecules using graph theory. We use a regularized algorithm to discover a sparse model of the error and identify important molecular fragments that drive this error. The model is robust to rigorous statistical tests and is used to correct DFT thermochemistry, achieving more than an order of magnitude improvement. Nature Publishing Group UK 2021-07-13 /pmc/articles/PMC8277863/ /pubmed/34257362 http://dx.doi.org/10.1038/s41598-021-93854-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Bhattacharjee, Himaghna
Anesiadis, Nikolaos
Vlachos, Dionisios G.
Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_full Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_fullStr Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_full_unstemmed Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_short Regularized machine learning on molecular graph model explains systematic error in DFT enthalpies
title_sort regularized machine learning on molecular graph model explains systematic error in dft enthalpies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8277863/
https://www.ncbi.nlm.nih.gov/pubmed/34257362
http://dx.doi.org/10.1038/s41598-021-93854-w
work_keys_str_mv AT bhattacharjeehimaghna regularizedmachinelearningonmoleculargraphmodelexplainssystematicerrorindftenthalpies
AT anesiadisnikolaos regularizedmachinelearningonmoleculargraphmodelexplainssystematicerrorindftenthalpies
AT vlachosdionisiosg regularizedmachinelearningonmoleculargraphmodelexplainssystematicerrorindftenthalpies