Cargando…

Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm

[Image: see text] Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning mod...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Mengshan, Chen, Huijie, Zhang, Hang, Zeng, Ming, Chen, Bingsheng, Guan, Lixin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9685740/
https://www.ncbi.nlm.nih.gov/pubmed/36440111
http://dx.doi.org/10.1021/acsomega.2c03885
_version_ 1784835579400159232
author Li, Mengshan
Chen, Huijie
Zhang, Hang
Zeng, Ming
Chen, Bingsheng
Guan, Lixin
author_facet Li, Mengshan
Chen, Huijie
Zhang, Hang
Zeng, Ming
Chen, Bingsheng
Guan, Lixin
author_sort Li, Mengshan
collection PubMed
description [Image: see text] Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and R(2) of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening.
format Online
Article
Text
id pubmed-9685740
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-96857402022-11-25 Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm Li, Mengshan Chen, Huijie Zhang, Hang Zeng, Ming Chen, Bingsheng Guan, Lixin ACS Omega [Image: see text] Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and R(2) of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening. American Chemical Society 2022-11-08 /pmc/articles/PMC9685740/ /pubmed/36440111 http://dx.doi.org/10.1021/acsomega.2c03885 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Li, Mengshan
Chen, Huijie
Zhang, Hang
Zeng, Ming
Chen, Bingsheng
Guan, Lixin
Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm
title Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm
title_full Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm
title_fullStr Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm
title_full_unstemmed Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm
title_short Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm
title_sort prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9685740/
https://www.ncbi.nlm.nih.gov/pubmed/36440111
http://dx.doi.org/10.1021/acsomega.2c03885
work_keys_str_mv AT limengshan predictionoftheaqueoussolubilityofcompoundsbasedonlightgradientboostingmachineswithmolecularfingerprintsandthecuckoosearchalgorithm
AT chenhuijie predictionoftheaqueoussolubilityofcompoundsbasedonlightgradientboostingmachineswithmolecularfingerprintsandthecuckoosearchalgorithm
AT zhanghang predictionoftheaqueoussolubilityofcompoundsbasedonlightgradientboostingmachineswithmolecularfingerprintsandthecuckoosearchalgorithm
AT zengming predictionoftheaqueoussolubilityofcompoundsbasedonlightgradientboostingmachineswithmolecularfingerprintsandthecuckoosearchalgorithm
AT chenbingsheng predictionoftheaqueoussolubilityofcompoundsbasedonlightgradientboostingmachineswithmolecularfingerprintsandthecuckoosearchalgorithm
AT guanlixin predictionoftheaqueoussolubilityofcompoundsbasedonlightgradientboostingmachineswithmolecularfingerprintsandthecuckoosearchalgorithm