Cargando…

Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction

[Image: see text] Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy...

Descripción completa

Detalles Bibliográficos
Autores principales: Panapitiya, Gihan, Girard, Michael, Hollas, Aaron, Sepulveda, Jonathan, Murugesan, Vijayakumar, Wang, Wei, Saldanha, Emily
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9096921/
https://www.ncbi.nlm.nih.gov/pubmed/35571767
http://dx.doi.org/10.1021/acsomega.2c00642
_version_ 1784706078117724160
author Panapitiya, Gihan
Girard, Michael
Hollas, Aaron
Sepulveda, Jonathan
Murugesan, Vijayakumar
Wang, Wei
Saldanha, Emily
author_facet Panapitiya, Gihan
Girard, Michael
Hollas, Aaron
Sepulveda, Jonathan
Murugesan, Vijayakumar
Wang, Wei
Saldanha, Emily
author_sort Panapitiya, Gihan
collection PubMed
description [Image: see text] Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures—fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
format Online
Article
Text
id pubmed-9096921
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-90969212022-05-13 Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction Panapitiya, Gihan Girard, Michael Hollas, Aaron Sepulveda, Jonathan Murugesan, Vijayakumar Wang, Wei Saldanha, Emily ACS Omega [Image: see text] Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures—fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance. American Chemical Society 2022-04-25 /pmc/articles/PMC9096921/ /pubmed/35571767 http://dx.doi.org/10.1021/acsomega.2c00642 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Panapitiya, Gihan
Girard, Michael
Hollas, Aaron
Sepulveda, Jonathan
Murugesan, Vijayakumar
Wang, Wei
Saldanha, Emily
Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
title Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
title_full Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
title_fullStr Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
title_full_unstemmed Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
title_short Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
title_sort evaluation of deep learning architectures for aqueous solubility prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9096921/
https://www.ncbi.nlm.nih.gov/pubmed/35571767
http://dx.doi.org/10.1021/acsomega.2c00642
work_keys_str_mv AT panapitiyagihan evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction
AT girardmichael evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction
AT hollasaaron evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction
AT sepulvedajonathan evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction
AT murugesanvijayakumar evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction
AT wangwei evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction
AT saldanhaemily evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction