Cargando…

Uncertainty quantification for predictions of atomistic neural networks

The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model (PhysNet-DER) was evaluated with different metrics to qua...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vazquez-Salazar, Luis Itza, Boittier, Eric D., Meuwly, Markus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	The Royal Society of Chemistry 2022
Materias:	Chemistry
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9667919/ https://www.ncbi.nlm.nih.gov/pubmed/36425481 http://dx.doi.org/10.1039/d2sc04056e

_version_	1784831806173872128
author	Vazquez-Salazar, Luis Itza Boittier, Eric D. Meuwly, Markus
author_facet	Vazquez-Salazar, Luis Itza Boittier, Eric D. Meuwly, Markus
author_sort	Vazquez-Salazar, Luis Itza
collection	PubMed
description	The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model (PhysNet-DER) was evaluated with different metrics to quantify its calibration, the quality of its predictions, and whether prediction error and the predicted uncertainty can be correlated. Training on the QM9 database and evaluating data in the test set within and outside the distribution indicate that error and uncertainty are not linearly related. However, the observed variance provides insight into the quality of the data used for training. Additionally, the influence of the chemical space covered by the training data set was studied by using a biased database. The results clarify that noise and redundancy complicate property prediction for molecules even in cases for which changes – such as double bond migration in two otherwise identical molecules – are small. The model was also applied to a real database of tautomerization reactions. Analysis of the distance between members in feature space in combination with other parameters shows that redundant information in the training dataset can lead to large variances and small errors whereas the presence of similar but unspecific information returns large errors but small variances. This was, e.g., observed for nitro-containing aliphatic chains for which predictions were difficult although the training set contained several examples for nitro groups bound to aromatic molecules. The finding underlines the importance of the composition of the training data and provides chemical insight into how this affects the prediction capabilities of a ML model. Finally, the presented method can be used for information-based improvement of chemical databases for target applications through active learning optimization.
format	Online Article Text
id	pubmed-9667919
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	The Royal Society of Chemistry
record_format	MEDLINE/PubMed
spelling	pubmed-96679192022-11-23 Uncertainty quantification for predictions of atomistic neural networks Vazquez-Salazar, Luis Itza Boittier, Eric D. Meuwly, Markus Chem Sci Chemistry The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model (PhysNet-DER) was evaluated with different metrics to quantify its calibration, the quality of its predictions, and whether prediction error and the predicted uncertainty can be correlated. Training on the QM9 database and evaluating data in the test set within and outside the distribution indicate that error and uncertainty are not linearly related. However, the observed variance provides insight into the quality of the data used for training. Additionally, the influence of the chemical space covered by the training data set was studied by using a biased database. The results clarify that noise and redundancy complicate property prediction for molecules even in cases for which changes – such as double bond migration in two otherwise identical molecules – are small. The model was also applied to a real database of tautomerization reactions. Analysis of the distance between members in feature space in combination with other parameters shows that redundant information in the training dataset can lead to large variances and small errors whereas the presence of similar but unspecific information returns large errors but small variances. This was, e.g., observed for nitro-containing aliphatic chains for which predictions were difficult although the training set contained several examples for nitro groups bound to aromatic molecules. The finding underlines the importance of the composition of the training data and provides chemical insight into how this affects the prediction capabilities of a ML model. Finally, the presented method can be used for information-based improvement of chemical databases for target applications through active learning optimization. The Royal Society of Chemistry 2022-10-17 /pmc/articles/PMC9667919/ /pubmed/36425481 http://dx.doi.org/10.1039/d2sc04056e Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle	Chemistry Vazquez-Salazar, Luis Itza Boittier, Eric D. Meuwly, Markus Uncertainty quantification for predictions of atomistic neural networks
title	Uncertainty quantification for predictions of atomistic neural networks
title_full	Uncertainty quantification for predictions of atomistic neural networks
title_fullStr	Uncertainty quantification for predictions of atomistic neural networks
title_full_unstemmed	Uncertainty quantification for predictions of atomistic neural networks
title_short	Uncertainty quantification for predictions of atomistic neural networks
title_sort	uncertainty quantification for predictions of atomistic neural networks
topic	Chemistry
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9667919/ https://www.ncbi.nlm.nih.gov/pubmed/36425481 http://dx.doi.org/10.1039/d2sc04056e
work_keys_str_mv	AT vazquezsalazarluisitza uncertaintyquantificationforpredictionsofatomisticneuralnetworks AT boittierericd uncertaintyquantificationforpredictionsofatomisticneuralnetworks AT meuwlymarkus uncertaintyquantificationforpredictionsofatomisticneuralnetworks

Uncertainty quantification for predictions of atomistic neural networks

Ejemplares similares