Cargando…

Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values

The prediction of the aqueous pK(a) of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global mode...

Descripción completa

Detalles Bibliográficos
Autores principales: Plante, Jeffrey, Caine, Beth A., Popelier, Paul L. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7922142/
https://www.ncbi.nlm.nih.gov/pubmed/33671348
http://dx.doi.org/10.3390/molecules26041048
_version_ 1783658621924540416
author Plante, Jeffrey
Caine, Beth A.
Popelier, Paul L. A.
author_facet Plante, Jeffrey
Caine, Beth A.
Popelier, Paul L. A.
author_sort Plante, Jeffrey
collection PubMed
description The prediction of the aqueous pK(a) of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK(a) prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pK(a) of the ionisable centre. In the current work, we augment our dataset with pK(a) values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.
format Online
Article
Text
id pubmed-7922142
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79221422021-03-03 Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values Plante, Jeffrey Caine, Beth A. Popelier, Paul L. A. Molecules Article The prediction of the aqueous pK(a) of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK(a) prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pK(a) of the ionisable centre. In the current work, we augment our dataset with pK(a) values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone. MDPI 2021-02-17 /pmc/articles/PMC7922142/ /pubmed/33671348 http://dx.doi.org/10.3390/molecules26041048 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Plante, Jeffrey
Caine, Beth A.
Popelier, Paul L. A.
Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_full Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_fullStr Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_full_unstemmed Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_short Enhancing Carbon Acid pK(a) Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values
title_sort enhancing carbon acid pk(a) prediction by augmentation of sparse experimental datasets with accurate aibl (qm) derived values
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7922142/
https://www.ncbi.nlm.nih.gov/pubmed/33671348
http://dx.doi.org/10.3390/molecules26041048
work_keys_str_mv AT plantejeffrey enhancingcarbonacidpkapredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues
AT cainebetha enhancingcarbonacidpkapredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues
AT popelierpaulla enhancingcarbonacidpkapredictionbyaugmentationofsparseexperimentaldatasetswithaccurateaiblqmderivedvalues