Cargando…

A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases

BACKGROUND: Non-covalent interactions (NCIs) play critical roles in supramolecular chemistries; however, they are difficult to measure. Currently, reliable computational methods are being pursued to meet this challenge, but the accuracy of calculations based on low levels of theory is not satisfacto...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gao, Ting, Li, Hongzhi, Li, Wenze, Li, Lin, Fang, Chao, Li, Hui, Hu, LiHong, Lu, Yinghua, Su, Zhong-Min
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855356/ https://www.ncbi.nlm.nih.gov/pubmed/27148408 http://dx.doi.org/10.1186/s13321-016-0133-7

_version_	1782430353829396480
author	Gao, Ting Li, Hongzhi Li, Wenze Li, Lin Fang, Chao Li, Hui Hu, LiHong Lu, Yinghua Su, Zhong-Min
author_facet	Gao, Ting Li, Hongzhi Li, Wenze Li, Lin Fang, Chao Li, Hui Hu, LiHong Lu, Yinghua Su, Zhong-Min
author_sort	Gao, Ting
collection	PubMed
description	BACKGROUND: Non-covalent interactions (NCIs) play critical roles in supramolecular chemistries; however, they are difficult to measure. Currently, reliable computational methods are being pursued to meet this challenge, but the accuracy of calculations based on low levels of theory is not satisfactory and calculations based on high levels of theory are often too costly. Accordingly, to reduce the cost and increase the accuracy of low-level theoretical calculations to describe NCIs, an efficient approach is proposed to correct NCI calculations based on the benchmark databases S22, S66 and X40 (Hobza in Acc Chem Rev 45: 663–672, 2012; Řezáč et al. in J Chem Theory Comput 8:4285, 2012). RESULTS: A novel type of NCI correction is presented for density functional theory (DFT) methods. In this approach, the general regression neural network machine learning method is used to perform the correction for DFT methods on the basis of DFT calculations. Various DFT methods, including M06-2X, B3LYP, B3LYP-D3, PBE, PBE-D3 and ωB97XD, with two small basis sets (i.e., 6-31G* and 6-31+G*) were investigated. Moreover, the conductor-like polarizable continuum model with two types of solvents (i.e., water and pentylamine, which mimics a protein environment with ε = 4.2) were considered in the DFT calculations. With the correction, the root mean square errors of all DFT calculations were improved by at least 70 %. Relative to CCSD(T)/CBS benchmark values (used as experimental NCI values because of its high accuracy), the mean absolute error of the best result was 0.33 kcal/mol, which is comparable to high-level ab initio methods or DFT methods with fairly large basis sets. Notably, this level of accuracy is achieved within a fraction of the time required by other methods. For all of the correction models based on various DFT approaches, the validation parameters according to OECD principles (i.e., the correlation coefficient R(2), the predictive squared correlation coefficient q(2) and [Formula: see text] from cross-validation) were >0.92, which suggests that the correction model has good stability, robustness and predictive power. CONCLUSIONS: The correction can be added following DFT calculations. With the obtained molecular descriptors, the NCIs produced by DFT methods can be improved to achieve high-level accuracy. Moreover, only one parameter is introduced into the correction model, which makes it easily applicable. Overall, this work demonstrates that the correction model may be an alternative to the traditional means of correcting for NCIs. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0133-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4855356
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-48553562016-05-05 A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases Gao, Ting Li, Hongzhi Li, Wenze Li, Lin Fang, Chao Li, Hui Hu, LiHong Lu, Yinghua Su, Zhong-Min J Cheminform Research Article BACKGROUND: Non-covalent interactions (NCIs) play critical roles in supramolecular chemistries; however, they are difficult to measure. Currently, reliable computational methods are being pursued to meet this challenge, but the accuracy of calculations based on low levels of theory is not satisfactory and calculations based on high levels of theory are often too costly. Accordingly, to reduce the cost and increase the accuracy of low-level theoretical calculations to describe NCIs, an efficient approach is proposed to correct NCI calculations based on the benchmark databases S22, S66 and X40 (Hobza in Acc Chem Rev 45: 663–672, 2012; Řezáč et al. in J Chem Theory Comput 8:4285, 2012). RESULTS: A novel type of NCI correction is presented for density functional theory (DFT) methods. In this approach, the general regression neural network machine learning method is used to perform the correction for DFT methods on the basis of DFT calculations. Various DFT methods, including M06-2X, B3LYP, B3LYP-D3, PBE, PBE-D3 and ωB97XD, with two small basis sets (i.e., 6-31G* and 6-31+G*) were investigated. Moreover, the conductor-like polarizable continuum model with two types of solvents (i.e., water and pentylamine, which mimics a protein environment with ε = 4.2) were considered in the DFT calculations. With the correction, the root mean square errors of all DFT calculations were improved by at least 70 %. Relative to CCSD(T)/CBS benchmark values (used as experimental NCI values because of its high accuracy), the mean absolute error of the best result was 0.33 kcal/mol, which is comparable to high-level ab initio methods or DFT methods with fairly large basis sets. Notably, this level of accuracy is achieved within a fraction of the time required by other methods. For all of the correction models based on various DFT approaches, the validation parameters according to OECD principles (i.e., the correlation coefficient R(2), the predictive squared correlation coefficient q(2) and [Formula: see text] from cross-validation) were >0.92, which suggests that the correction model has good stability, robustness and predictive power. CONCLUSIONS: The correction can be added following DFT calculations. With the obtained molecular descriptors, the NCIs produced by DFT methods can be improved to achieve high-level accuracy. Moreover, only one parameter is introduced into the correction model, which makes it easily applicable. Overall, this work demonstrates that the correction model may be an alternative to the traditional means of correcting for NCIs. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0133-7) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-05-03 /pmc/articles/PMC4855356/ /pubmed/27148408 http://dx.doi.org/10.1186/s13321-016-0133-7 Text en © Gao et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Gao, Ting Li, Hongzhi Li, Wenze Li, Lin Fang, Chao Li, Hui Hu, LiHong Lu, Yinghua Su, Zhong-Min A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases
title	A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases
title_full	A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases
title_fullStr	A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases
title_full_unstemmed	A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases
title_short	A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases
title_sort	machine learning correction for dft non-covalent interactions based on the s22, s66 and x40 benchmark databases
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855356/ https://www.ncbi.nlm.nih.gov/pubmed/27148408 http://dx.doi.org/10.1186/s13321-016-0133-7
work_keys_str_mv	AT gaoting amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT lihongzhi amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT liwenze amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT lilin amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT fangchao amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT lihui amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT hulihong amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT luyinghua amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT suzhongmin amachinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT gaoting machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT lihongzhi machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT liwenze machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT lilin machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT fangchao machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT lihui machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT hulihong machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT luyinghua machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases AT suzhongmin machinelearningcorrectionfordftnoncovalentinteractionsbasedonthes22s66andx40benchmarkdatabases

A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases

Ejemplares similares