Cargando…

Open-source QSAR models for pKa prediction using multiple machine learning approaches

BACKGROUND: The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity proper...

Descripción completa

Detalles Bibliográficos
Autores principales: Mansouri, Kamel, Cariello, Neal F., Korotcov, Alexandru, Tkachenko, Valery, Grulke, Chris M., Sprankle, Catherine S., Allen, David, Casey, Warren M., Kleinstreuer, Nicole C., Williams, Antony J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749653/
https://www.ncbi.nlm.nih.gov/pubmed/33430972
http://dx.doi.org/10.1186/s13321-019-0384-1
_version_ 1783452321659748352
author Mansouri, Kamel
Cariello, Neal F.
Korotcov, Alexandru
Tkachenko, Valery
Grulke, Chris M.
Sprankle, Catherine S.
Allen, David
Casey, Warren M.
Kleinstreuer, Nicole C.
Williams, Antony J.
author_facet Mansouri, Kamel
Cariello, Neal F.
Korotcov, Alexandru
Tkachenko, Valery
Grulke, Chris M.
Sprankle, Catherine S.
Allen, David
Casey, Warren M.
Kleinstreuer, Nicole C.
Williams, Antony J.
author_sort Mansouri, Kamel
collection PubMed
description BACKGROUND: The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. METHODS: The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure–activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). RESULTS: The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R(2)) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. CONCLUSIONS: This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
format Online
Article
Text
id pubmed-6749653
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-67496532019-09-23 Open-source QSAR models for pKa prediction using multiple machine learning approaches Mansouri, Kamel Cariello, Neal F. Korotcov, Alexandru Tkachenko, Valery Grulke, Chris M. Sprankle, Catherine S. Allen, David Casey, Warren M. Kleinstreuer, Nicole C. Williams, Antony J. J Cheminform Research Article BACKGROUND: The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. METHODS: The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure–activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). RESULTS: The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R(2)) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. CONCLUSIONS: This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub. Springer International Publishing 2019-09-18 /pmc/articles/PMC6749653/ /pubmed/33430972 http://dx.doi.org/10.1186/s13321-019-0384-1 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Mansouri, Kamel
Cariello, Neal F.
Korotcov, Alexandru
Tkachenko, Valery
Grulke, Chris M.
Sprankle, Catherine S.
Allen, David
Casey, Warren M.
Kleinstreuer, Nicole C.
Williams, Antony J.
Open-source QSAR models for pKa prediction using multiple machine learning approaches
title Open-source QSAR models for pKa prediction using multiple machine learning approaches
title_full Open-source QSAR models for pKa prediction using multiple machine learning approaches
title_fullStr Open-source QSAR models for pKa prediction using multiple machine learning approaches
title_full_unstemmed Open-source QSAR models for pKa prediction using multiple machine learning approaches
title_short Open-source QSAR models for pKa prediction using multiple machine learning approaches
title_sort open-source qsar models for pka prediction using multiple machine learning approaches
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749653/
https://www.ncbi.nlm.nih.gov/pubmed/33430972
http://dx.doi.org/10.1186/s13321-019-0384-1
work_keys_str_mv AT mansourikamel opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT cariellonealf opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT korotcovalexandru opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT tkachenkovalery opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT grulkechrism opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT spranklecatherines opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT allendavid opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT caseywarrenm opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT kleinstreuernicolec opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches
AT williamsantonyj opensourceqsarmodelsforpkapredictionusingmultiplemachinelearningapproaches