Cargando…

QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality

We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions out...

Descripción completa

Detalles Bibliográficos
Autores principales: Wood, David J., Carlsson, Lars, Eklund, Martin, Norinder, Ulf, Stålring, Jonna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3639359/
https://www.ncbi.nlm.nih.gov/pubmed/23504478
http://dx.doi.org/10.1007/s10822-013-9639-5
_version_ 1782475945110667264
author Wood, David J.
Carlsson, Lars
Eklund, Martin
Norinder, Ulf
Stålring, Jonna
author_facet Wood, David J.
Carlsson, Lars
Eklund, Martin
Norinder, Ulf
Stålring, Jonna
author_sort Wood, David J.
collection PubMed
description We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions output by a model can be assessed with Kullback–Leibler (KL) divergence: a widely used information theoretic measure of the distance between two probability distributions. We have assessed a range of different machine learning algorithms and error estimation methods for producing predictive distributions with an analysis against three of AstraZeneca’s global DMPK datasets. Using the KL-divergence framework, we have identified a few combinations of algorithms that produce accurate and valid compound-specific predictive distributions. These methods use reliability indices to assign predictive distributions to the predictions output by QSAR models so that reliable predictions have tight distributions and vice versa. Finally we show how valid predictive distributions can be used to estimate the probability that a test compound has properties that hit single- or multi- objective target profiles. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10822-013-9639-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-3639359
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-36393592013-04-30 QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality Wood, David J. Carlsson, Lars Eklund, Martin Norinder, Ulf Stålring, Jonna J Comput Aided Mol Des Article We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions output by a model can be assessed with Kullback–Leibler (KL) divergence: a widely used information theoretic measure of the distance between two probability distributions. We have assessed a range of different machine learning algorithms and error estimation methods for producing predictive distributions with an analysis against three of AstraZeneca’s global DMPK datasets. Using the KL-divergence framework, we have identified a few combinations of algorithms that produce accurate and valid compound-specific predictive distributions. These methods use reliability indices to assign predictive distributions to the predictions output by QSAR models so that reliable predictions have tight distributions and vice versa. Finally we show how valid predictive distributions can be used to estimate the probability that a test compound has properties that hit single- or multi- objective target profiles. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10822-013-9639-5) contains supplementary material, which is available to authorized users. Springer Netherlands 2013-03-16 2013 /pmc/articles/PMC3639359/ /pubmed/23504478 http://dx.doi.org/10.1007/s10822-013-9639-5 Text en © The Author(s) 2013 https://creativecommons.org/licenses/by/2.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Article
Wood, David J.
Carlsson, Lars
Eklund, Martin
Norinder, Ulf
Stålring, Jonna
QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality
title QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality
title_full QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality
title_fullStr QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality
title_full_unstemmed QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality
title_short QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality
title_sort qsar with experimental and predictive distributions: an information theoretic approach for assessing model quality
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3639359/
https://www.ncbi.nlm.nih.gov/pubmed/23504478
http://dx.doi.org/10.1007/s10822-013-9639-5
work_keys_str_mv AT wooddavidj qsarwithexperimentalandpredictivedistributionsaninformationtheoreticapproachforassessingmodelquality
AT carlssonlars qsarwithexperimentalandpredictivedistributionsaninformationtheoreticapproachforassessingmodelquality
AT eklundmartin qsarwithexperimentalandpredictivedistributionsaninformationtheoreticapproachforassessingmodelquality
AT norinderulf qsarwithexperimentalandpredictivedistributionsaninformationtheoreticapproachforassessingmodelquality
AT stalringjonna qsarwithexperimentalandpredictivedistributionsaninformationtheoreticapproachforassessingmodelquality