Cargando…

Pruned Machine Learning Models to Predict Aqueous Solubility

[Image: see text] Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naïve Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous s...

Descripción completa

Detalles Bibliográficos
Autores principales: Perryman, Alexander L., Inoyama, Daigo, Patel, Jimmy S., Ekins, Sean, Freundlich, Joel S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2020
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7364544/
https://www.ncbi.nlm.nih.gov/pubmed/32685821
http://dx.doi.org/10.1021/acsomega.0c01251
_version_ 1783559851096408064
author Perryman, Alexander L.
Inoyama, Daigo
Patel, Jimmy S.
Ekins, Sean
Freundlich, Joel S.
author_facet Perryman, Alexander L.
Inoyama, Daigo
Patel, Jimmy S.
Ekins, Sean
Freundlich, Joel S.
author_sort Perryman, Alexander L.
collection PubMed
description [Image: see text] Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naïve Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous solubility data were used to create two full, or nonpruned, training sets. These two sets were also combined to create a full fused set, and a training set comprised of a literature collation of solubility data was also considered as a reference. We tested different extents of data pruning on the training sets and constructed machine learning models that were evaluated with two independent, external test sets that contained compounds that were different from the training sets. The best pruned and fused model was significantly more accurate, in comparison to either the full model or the full fused model, with the prediction of these external test sets. By carefully removing data from the training set, less information can be used to create more accurate machine learning models for aqueous solubility. This knowledge and the curated training sets should prove useful to future machine learning approaches.
format Online
Article
Text
id pubmed-7364544
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-73645442020-07-17 Pruned Machine Learning Models to Predict Aqueous Solubility Perryman, Alexander L. Inoyama, Daigo Patel, Jimmy S. Ekins, Sean Freundlich, Joel S. ACS Omega [Image: see text] Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naïve Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous solubility data were used to create two full, or nonpruned, training sets. These two sets were also combined to create a full fused set, and a training set comprised of a literature collation of solubility data was also considered as a reference. We tested different extents of data pruning on the training sets and constructed machine learning models that were evaluated with two independent, external test sets that contained compounds that were different from the training sets. The best pruned and fused model was significantly more accurate, in comparison to either the full model or the full fused model, with the prediction of these external test sets. By carefully removing data from the training set, less information can be used to create more accurate machine learning models for aqueous solubility. This knowledge and the curated training sets should prove useful to future machine learning approaches. American Chemical Society 2020-07-01 /pmc/articles/PMC7364544/ /pubmed/32685821 http://dx.doi.org/10.1021/acsomega.0c01251 Text en Copyright © 2020 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Perryman, Alexander L.
Inoyama, Daigo
Patel, Jimmy S.
Ekins, Sean
Freundlich, Joel S.
Pruned Machine Learning Models to Predict Aqueous Solubility
title Pruned Machine Learning Models to Predict Aqueous Solubility
title_full Pruned Machine Learning Models to Predict Aqueous Solubility
title_fullStr Pruned Machine Learning Models to Predict Aqueous Solubility
title_full_unstemmed Pruned Machine Learning Models to Predict Aqueous Solubility
title_short Pruned Machine Learning Models to Predict Aqueous Solubility
title_sort pruned machine learning models to predict aqueous solubility
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7364544/
https://www.ncbi.nlm.nih.gov/pubmed/32685821
http://dx.doi.org/10.1021/acsomega.0c01251
work_keys_str_mv AT perrymanalexanderl prunedmachinelearningmodelstopredictaqueoussolubility
AT inoyamadaigo prunedmachinelearningmodelstopredictaqueoussolubility
AT pateljimmys prunedmachinelearningmodelstopredictaqueoussolubility
AT ekinssean prunedmachinelearningmodelstopredictaqueoussolubility
AT freundlichjoels prunedmachinelearningmodelstopredictaqueoussolubility