Cargando…

Boosting the predictive performance with aqueous solubility dataset curation

Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based a...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Jintao, Chen, Peng, Wahib, Mohamed, Yang, Mingjun, Zheng, Liangzhen, Wei, Yanjie, Feng, Shengzhong, Liu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8894363/
https://www.ncbi.nlm.nih.gov/pubmed/35241693
http://dx.doi.org/10.1038/s41597-022-01154-3
Descripción
Sumario:Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based approaches. We collect 7 aqueous solubility datasets, and present a dataset curation workflow. Evaluating the curated data with two expanded deep learning methods, improved RMSE scores on all curated thermodynamic datasets are observed. We also compare expanded Chemprop enhanced with curated data and state-of-art physics-based approach using pearson and spearman correlation coefficients. A similar performance on pearson with 0.930 and spearman with 0.947 from expanded Chemprop is achieved. A steadily improved pearson and spearman values with increasing data points are also illustrated. Besides that, the computation advantage of AI models enables quick evaluation of a large set of molecules during the hit identification or lead optimization stages, which helps further decision making within the time cycle at drug discovery stage.