Cargando…
Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction
[Image: see text] Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9096921/ https://www.ncbi.nlm.nih.gov/pubmed/35571767 http://dx.doi.org/10.1021/acsomega.2c00642 |
_version_ | 1784706078117724160 |
---|---|
author | Panapitiya, Gihan Girard, Michael Hollas, Aaron Sepulveda, Jonathan Murugesan, Vijayakumar Wang, Wei Saldanha, Emily |
author_facet | Panapitiya, Gihan Girard, Michael Hollas, Aaron Sepulveda, Jonathan Murugesan, Vijayakumar Wang, Wei Saldanha, Emily |
author_sort | Panapitiya, Gihan |
collection | PubMed |
description | [Image: see text] Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures—fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance. |
format | Online Article Text |
id | pubmed-9096921 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-90969212022-05-13 Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction Panapitiya, Gihan Girard, Michael Hollas, Aaron Sepulveda, Jonathan Murugesan, Vijayakumar Wang, Wei Saldanha, Emily ACS Omega [Image: see text] Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures—fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance. American Chemical Society 2022-04-25 /pmc/articles/PMC9096921/ /pubmed/35571767 http://dx.doi.org/10.1021/acsomega.2c00642 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Panapitiya, Gihan Girard, Michael Hollas, Aaron Sepulveda, Jonathan Murugesan, Vijayakumar Wang, Wei Saldanha, Emily Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction |
title | Evaluation of Deep Learning Architectures for Aqueous
Solubility Prediction |
title_full | Evaluation of Deep Learning Architectures for Aqueous
Solubility Prediction |
title_fullStr | Evaluation of Deep Learning Architectures for Aqueous
Solubility Prediction |
title_full_unstemmed | Evaluation of Deep Learning Architectures for Aqueous
Solubility Prediction |
title_short | Evaluation of Deep Learning Architectures for Aqueous
Solubility Prediction |
title_sort | evaluation of deep learning architectures for aqueous
solubility prediction |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9096921/ https://www.ncbi.nlm.nih.gov/pubmed/35571767 http://dx.doi.org/10.1021/acsomega.2c00642 |
work_keys_str_mv | AT panapitiyagihan evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction AT girardmichael evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction AT hollasaaron evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction AT sepulvedajonathan evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction AT murugesanvijayakumar evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction AT wangwei evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction AT saldanhaemily evaluationofdeeplearningarchitecturesforaqueoussolubilityprediction |