Cargando…

Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database

The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database...

Descripción completa

Detalles Bibliográficos
Autor principal: Avdeef, Alex
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Association of Physical Chemists 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8915599/
https://www.ncbi.nlm.nih.gov/pubmed/35299775
http://dx.doi.org/10.5599/admet.766
_version_ 1784668067471556608
author Avdeef, Alex
author_facet Avdeef, Alex
author_sort Avdeef, Alex
collection PubMed
description The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S(0)) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky’s general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what’s missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data.
format Online
Article
Text
id pubmed-8915599
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher International Association of Physical Chemists
record_format MEDLINE/PubMed
spelling pubmed-89155992022-03-16 Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database Avdeef, Alex ADMET DMPK Original Scientific Papers The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S(0)) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky’s general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what’s missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data. International Association of Physical Chemists 2020-03-04 /pmc/articles/PMC8915599/ /pubmed/35299775 http://dx.doi.org/10.5599/admet.766 Text en Copyright © 2020 by the authors. https://creativecommons.org/licenses/by/4.0/This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Original Scientific Papers
Avdeef, Alex
Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database
title Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database
title_full Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database
title_fullStr Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database
title_full_unstemmed Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database
title_short Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database
title_sort prediction of aqueous intrinsic solubility of druglike molecules using random forest regression trained with wiki-ps0 database
topic Original Scientific Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8915599/
https://www.ncbi.nlm.nih.gov/pubmed/35299775
http://dx.doi.org/10.5599/admet.766
work_keys_str_mv AT avdeefalex predictionofaqueousintrinsicsolubilityofdruglikemoleculesusingrandomforestregressiontrainedwithwikips0database