Cargando…

A quantitative uncertainty metric controls error in neural network-driven chemical discovery

Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is...

Descripción completa

Detalles Bibliográficos
Autores principales: Janet, Jon Paul, Duan, Chenru, Yang, Tzuhsiung, Nandy, Aditya, Kulik, Heather J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Royal Society of Chemistry 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764470/
https://www.ncbi.nlm.nih.gov/pubmed/31588334
http://dx.doi.org/10.1039/c9sc02298h
_version_ 1783454382307672064
author Janet, Jon Paul
Duan, Chenru
Yang, Tzuhsiung
Nandy, Aditya
Kulik, Heather J.
author_facet Janet, Jon Paul
Duan, Chenru
Yang, Tzuhsiung
Nandy, Aditya
Kulik, Heather J.
author_sort Janet, Jon Paul
collection PubMed
description Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
format Online
Article
Text
id pubmed-6764470
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-67644702019-10-04 A quantitative uncertainty metric controls error in neural network-driven chemical discovery Janet, Jon Paul Duan, Chenru Yang, Tzuhsiung Nandy, Aditya Kulik, Heather J. Chem Sci Chemistry Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning. Royal Society of Chemistry 2019-07-11 /pmc/articles/PMC6764470/ /pubmed/31588334 http://dx.doi.org/10.1039/c9sc02298h Text en This journal is © The Royal Society of Chemistry 2019 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0)
spellingShingle Chemistry
Janet, Jon Paul
Duan, Chenru
Yang, Tzuhsiung
Nandy, Aditya
Kulik, Heather J.
A quantitative uncertainty metric controls error in neural network-driven chemical discovery
title A quantitative uncertainty metric controls error in neural network-driven chemical discovery
title_full A quantitative uncertainty metric controls error in neural network-driven chemical discovery
title_fullStr A quantitative uncertainty metric controls error in neural network-driven chemical discovery
title_full_unstemmed A quantitative uncertainty metric controls error in neural network-driven chemical discovery
title_short A quantitative uncertainty metric controls error in neural network-driven chemical discovery
title_sort quantitative uncertainty metric controls error in neural network-driven chemical discovery
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764470/
https://www.ncbi.nlm.nih.gov/pubmed/31588334
http://dx.doi.org/10.1039/c9sc02298h
work_keys_str_mv AT janetjonpaul aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT duanchenru aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT yangtzuhsiung aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT nandyaditya aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT kulikheatherj aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT janetjonpaul quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT duanchenru quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT yangtzuhsiung quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT nandyaditya quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery
AT kulikheatherj quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery