Cargando…

Boosting the predictive performance with aqueous solubility dataset curation

Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based a...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Jintao, Chen, Peng, Wahib, Mohamed, Yang, Mingjun, Zheng, Liangzhen, Wei, Yanjie, Feng, Shengzhong, Liu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8894363/
https://www.ncbi.nlm.nih.gov/pubmed/35241693
http://dx.doi.org/10.1038/s41597-022-01154-3
_version_ 1784662641672716288
author Meng, Jintao
Chen, Peng
Wahib, Mohamed
Yang, Mingjun
Zheng, Liangzhen
Wei, Yanjie
Feng, Shengzhong
Liu, Wei
author_facet Meng, Jintao
Chen, Peng
Wahib, Mohamed
Yang, Mingjun
Zheng, Liangzhen
Wei, Yanjie
Feng, Shengzhong
Liu, Wei
author_sort Meng, Jintao
collection PubMed
description Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based approaches. We collect 7 aqueous solubility datasets, and present a dataset curation workflow. Evaluating the curated data with two expanded deep learning methods, improved RMSE scores on all curated thermodynamic datasets are observed. We also compare expanded Chemprop enhanced with curated data and state-of-art physics-based approach using pearson and spearman correlation coefficients. A similar performance on pearson with 0.930 and spearman with 0.947 from expanded Chemprop is achieved. A steadily improved pearson and spearman values with increasing data points are also illustrated. Besides that, the computation advantage of AI models enables quick evaluation of a large set of molecules during the hit identification or lead optimization stages, which helps further decision making within the time cycle at drug discovery stage.
format Online
Article
Text
id pubmed-8894363
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-88943632022-03-08 Boosting the predictive performance with aqueous solubility dataset curation Meng, Jintao Chen, Peng Wahib, Mohamed Yang, Mingjun Zheng, Liangzhen Wei, Yanjie Feng, Shengzhong Liu, Wei Sci Data Analysis Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based approaches. We collect 7 aqueous solubility datasets, and present a dataset curation workflow. Evaluating the curated data with two expanded deep learning methods, improved RMSE scores on all curated thermodynamic datasets are observed. We also compare expanded Chemprop enhanced with curated data and state-of-art physics-based approach using pearson and spearman correlation coefficients. A similar performance on pearson with 0.930 and spearman with 0.947 from expanded Chemprop is achieved. A steadily improved pearson and spearman values with increasing data points are also illustrated. Besides that, the computation advantage of AI models enables quick evaluation of a large set of molecules during the hit identification or lead optimization stages, which helps further decision making within the time cycle at drug discovery stage. Nature Publishing Group UK 2022-03-03 /pmc/articles/PMC8894363/ /pubmed/35241693 http://dx.doi.org/10.1038/s41597-022-01154-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Analysis
Meng, Jintao
Chen, Peng
Wahib, Mohamed
Yang, Mingjun
Zheng, Liangzhen
Wei, Yanjie
Feng, Shengzhong
Liu, Wei
Boosting the predictive performance with aqueous solubility dataset curation
title Boosting the predictive performance with aqueous solubility dataset curation
title_full Boosting the predictive performance with aqueous solubility dataset curation
title_fullStr Boosting the predictive performance with aqueous solubility dataset curation
title_full_unstemmed Boosting the predictive performance with aqueous solubility dataset curation
title_short Boosting the predictive performance with aqueous solubility dataset curation
title_sort boosting the predictive performance with aqueous solubility dataset curation
topic Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8894363/
https://www.ncbi.nlm.nih.gov/pubmed/35241693
http://dx.doi.org/10.1038/s41597-022-01154-3
work_keys_str_mv AT mengjintao boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT chenpeng boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT wahibmohamed boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT yangmingjun boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT zhengliangzhen boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT weiyanjie boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT fengshengzhong boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration
AT liuwei boostingthepredictiveperformancewithaqueoussolubilitydatasetcuration