Cargando…

Multi-fidelity prediction of molecular optical peaks with deep learning

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Greenman, Kevin P., Green, William H., Gómez-Bombarelli, Rafael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790778/
https://www.ncbi.nlm.nih.gov/pubmed/35211282
http://dx.doi.org/10.1039/d1sc05677h
_version_ 1784640090913374208
author Greenman, Kevin P.
Green, William H.
Gómez-Bombarelli, Rafael
author_facet Greenman, Kevin P.
Green, William H.
Gómez-Bombarelli, Rafael
author_sort Greenman, Kevin P.
collection PubMed
description Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28 000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties.
format Online
Article
Text
id pubmed-8790778
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-87907782022-02-23 Multi-fidelity prediction of molecular optical peaks with deep learning Greenman, Kevin P. Green, William H. Gómez-Bombarelli, Rafael Chem Sci Chemistry Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28 000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties. The Royal Society of Chemistry 2022-01-04 /pmc/articles/PMC8790778/ /pubmed/35211282 http://dx.doi.org/10.1039/d1sc05677h Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/
spellingShingle Chemistry
Greenman, Kevin P.
Green, William H.
Gómez-Bombarelli, Rafael
Multi-fidelity prediction of molecular optical peaks with deep learning
title Multi-fidelity prediction of molecular optical peaks with deep learning
title_full Multi-fidelity prediction of molecular optical peaks with deep learning
title_fullStr Multi-fidelity prediction of molecular optical peaks with deep learning
title_full_unstemmed Multi-fidelity prediction of molecular optical peaks with deep learning
title_short Multi-fidelity prediction of molecular optical peaks with deep learning
title_sort multi-fidelity prediction of molecular optical peaks with deep learning
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790778/
https://www.ncbi.nlm.nih.gov/pubmed/35211282
http://dx.doi.org/10.1039/d1sc05677h
work_keys_str_mv AT greenmankevinp multifidelitypredictionofmolecularopticalpeakswithdeeplearning
AT greenwilliamh multifidelitypredictionofmolecularopticalpeakswithdeeplearning
AT gomezbombarellirafael multifidelitypredictionofmolecularopticalpeakswithdeeplearning