Cargando…

Machine learning meets pK (a)

We present a small molecule pK (a) prediction tool entirely written in Python. It predicts the macroscopic pK (a) value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Baltruschat, Marcel, Czodrowski, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096188/
https://www.ncbi.nlm.nih.gov/pubmed/32226607
http://dx.doi.org/10.12688/f1000research.22090.2
_version_ 1783510766035402752
author Baltruschat, Marcel
Czodrowski, Paul
author_facet Baltruschat, Marcel
Czodrowski, Paul
author_sort Baltruschat, Marcel
collection PubMed
description We present a small molecule pK (a) prediction tool entirely written in Python. It predicts the macroscopic pK (a) value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validation (mean absolute error=0.682, root mean squared error=1.032, correlation coefficient r (2) =0.82). We test our model on two external validation sets, where our model performs comparable to Marvin and is better than a recently published open source model. Our Python tool and all data is freely available at https://github.com/czodrowskilab/Machine-learning-meets-pKa.
format Online
Article
Text
id pubmed-7096188
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-70961882020-03-27 Machine learning meets pK (a) Baltruschat, Marcel Czodrowski, Paul F1000Res Research Article We present a small molecule pK (a) prediction tool entirely written in Python. It predicts the macroscopic pK (a) value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validation (mean absolute error=0.682, root mean squared error=1.032, correlation coefficient r (2) =0.82). We test our model on two external validation sets, where our model performs comparable to Marvin and is better than a recently published open source model. Our Python tool and all data is freely available at https://github.com/czodrowskilab/Machine-learning-meets-pKa. F1000 Research Limited 2020-04-27 /pmc/articles/PMC7096188/ /pubmed/32226607 http://dx.doi.org/10.12688/f1000research.22090.2 Text en Copyright: © 2020 Baltruschat M and Czodrowski P http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Baltruschat, Marcel
Czodrowski, Paul
Machine learning meets pK (a)
title Machine learning meets pK (a)
title_full Machine learning meets pK (a)
title_fullStr Machine learning meets pK (a)
title_full_unstemmed Machine learning meets pK (a)
title_short Machine learning meets pK (a)
title_sort machine learning meets pk (a)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096188/
https://www.ncbi.nlm.nih.gov/pubmed/32226607
http://dx.doi.org/10.12688/f1000research.22090.2
work_keys_str_mv AT baltruschatmarcel machinelearningmeetspka
AT czodrowskipaul machinelearningmeetspka