Cargando…

Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules

[Image: see text] We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantita...

Descripción completa

Detalles Bibliográficos
Autores principales: McDonagh, James L., Nath, Neetika, De Ferrari, Luna, van Mourik, Tanja, Mitchell, John B. O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2014
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965570/
https://www.ncbi.nlm.nih.gov/pubmed/24564264
http://dx.doi.org/10.1021/ci4005805
_version_ 1782308829567909888
author McDonagh, James L.
Nath, Neetika
De Ferrari, Luna
van Mourik, Tanja
Mitchell, John B. O.
author_facet McDonagh, James L.
Nath, Neetika
De Ferrari, Luna
van Mourik, Tanja
Mitchell, John B. O.
author_sort McDonagh, James L.
collection PubMed
description [Image: see text] We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ∼1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.
format Online
Article
Text
id pubmed-3965570
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-39655702014-03-25 Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules McDonagh, James L. Nath, Neetika De Ferrari, Luna van Mourik, Tanja Mitchell, John B. O. J Chem Inf Model [Image: see text] We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ∼1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units. American Chemical Society 2014-02-24 2014-03-24 /pmc/articles/PMC3965570/ /pubmed/24564264 http://dx.doi.org/10.1021/ci4005805 Text en Copyright © 2014 American Chemical Society Terms of Use CC-BY (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html)
spellingShingle McDonagh, James L.
Nath, Neetika
De Ferrari, Luna
van Mourik, Tanja
Mitchell, John B. O.
Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
title Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
title_full Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
title_fullStr Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
title_full_unstemmed Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
title_short Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
title_sort uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965570/
https://www.ncbi.nlm.nih.gov/pubmed/24564264
http://dx.doi.org/10.1021/ci4005805
work_keys_str_mv AT mcdonaghjamesl unitingcheminformaticsandchemicaltheorytopredicttheintrinsicaqueoussolubilityofcrystallinedruglikemolecules
AT nathneetika unitingcheminformaticsandchemicaltheorytopredicttheintrinsicaqueoussolubilityofcrystallinedruglikemolecules
AT deferrariluna unitingcheminformaticsandchemicaltheorytopredicttheintrinsicaqueoussolubilityofcrystallinedruglikemolecules
AT vanmouriktanja unitingcheminformaticsandchemicaltheorytopredicttheintrinsicaqueoussolubilityofcrystallinedruglikemolecules
AT mitchelljohnbo unitingcheminformaticsandchemicaltheorytopredicttheintrinsicaqueoussolubilityofcrystallinedruglikemolecules