Cargando…
A quantitative uncertainty metric controls error in neural network-driven chemical discovery
Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Royal Society of Chemistry
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764470/ https://www.ncbi.nlm.nih.gov/pubmed/31588334 http://dx.doi.org/10.1039/c9sc02298h |
_version_ | 1783454382307672064 |
---|---|
author | Janet, Jon Paul Duan, Chenru Yang, Tzuhsiung Nandy, Aditya Kulik, Heather J. |
author_facet | Janet, Jon Paul Duan, Chenru Yang, Tzuhsiung Nandy, Aditya Kulik, Heather J. |
author_sort | Janet, Jon Paul |
collection | PubMed |
description | Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning. |
format | Online Article Text |
id | pubmed-6764470 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-67644702019-10-04 A quantitative uncertainty metric controls error in neural network-driven chemical discovery Janet, Jon Paul Duan, Chenru Yang, Tzuhsiung Nandy, Aditya Kulik, Heather J. Chem Sci Chemistry Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning. Royal Society of Chemistry 2019-07-11 /pmc/articles/PMC6764470/ /pubmed/31588334 http://dx.doi.org/10.1039/c9sc02298h Text en This journal is © The Royal Society of Chemistry 2019 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0) |
spellingShingle | Chemistry Janet, Jon Paul Duan, Chenru Yang, Tzuhsiung Nandy, Aditya Kulik, Heather J. A quantitative uncertainty metric controls error in neural network-driven chemical discovery |
title | A quantitative uncertainty metric controls error in neural network-driven chemical discovery
|
title_full | A quantitative uncertainty metric controls error in neural network-driven chemical discovery
|
title_fullStr | A quantitative uncertainty metric controls error in neural network-driven chemical discovery
|
title_full_unstemmed | A quantitative uncertainty metric controls error in neural network-driven chemical discovery
|
title_short | A quantitative uncertainty metric controls error in neural network-driven chemical discovery
|
title_sort | quantitative uncertainty metric controls error in neural network-driven chemical discovery |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764470/ https://www.ncbi.nlm.nih.gov/pubmed/31588334 http://dx.doi.org/10.1039/c9sc02298h |
work_keys_str_mv | AT janetjonpaul aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT duanchenru aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT yangtzuhsiung aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT nandyaditya aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT kulikheatherj aquantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT janetjonpaul quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT duanchenru quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT yangtzuhsiung quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT nandyaditya quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery AT kulikheatherj quantitativeuncertaintymetriccontrolserrorinneuralnetworkdrivenchemicaldiscovery |