Cargando…

Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

BACKGROUND: With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. Although in recent years several different...

Descripción completa

Detalles Bibliográficos
Autores principales: Sahigara, Faizan, Ballabio, Davide, Todeschini, Roberto, Consonni, Viviana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679843/
https://www.ncbi.nlm.nih.gov/pubmed/23721648
http://dx.doi.org/10.1186/1758-2946-5-27
_version_ 1782273028937220096
author Sahigara, Faizan
Ballabio, Davide
Todeschini, Roberto
Consonni, Viviana
author_facet Sahigara, Faizan
Ballabio, Davide
Todeschini, Roberto
Consonni, Viviana
author_sort Sahigara, Faizan
collection PubMed
description BACKGROUND: With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model’s AD has yet been recognized. RESULTS: This study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account. CONCLUSIONS: The proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model’s AD for reliable predictions.
format Online
Article
Text
id pubmed-3679843
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36798432013-06-25 Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions Sahigara, Faizan Ballabio, Davide Todeschini, Roberto Consonni, Viviana J Cheminform Research Article BACKGROUND: With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model’s AD has yet been recognized. RESULTS: This study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account. CONCLUSIONS: The proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model’s AD for reliable predictions. BioMed Central 2013-05-30 /pmc/articles/PMC3679843/ /pubmed/23721648 http://dx.doi.org/10.1186/1758-2946-5-27 Text en Copyright © 2013 Sahigara et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Sahigara, Faizan
Ballabio, Davide
Todeschini, Roberto
Consonni, Viviana
Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
title Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
title_full Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
title_fullStr Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
title_full_unstemmed Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
title_short Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
title_sort defining a novel k-nearest neighbours approach to assess the applicability domain of a qsar model for reliable predictions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679843/
https://www.ncbi.nlm.nih.gov/pubmed/23721648
http://dx.doi.org/10.1186/1758-2946-5-27
work_keys_str_mv AT sahigarafaizan defininganovelknearestneighboursapproachtoassesstheapplicabilitydomainofaqsarmodelforreliablepredictions
AT ballabiodavide defininganovelknearestneighboursapproachtoassesstheapplicabilitydomainofaqsarmodelforreliablepredictions
AT todeschiniroberto defininganovelknearestneighboursapproachtoassesstheapplicabilitydomainofaqsarmodelforreliablepredictions
AT consonniviviana defininganovelknearestneighboursapproachtoassesstheapplicabilitydomainofaqsarmodelforreliablepredictions