Cargando…

A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for pre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aniceto, Natália, Freitas, Alex A., Bender, Andreas, Ghafourian, Taravat
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2016
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5395519/ http://dx.doi.org/10.1186/s13321-016-0182-y

_version_	1783229875144884224
author	Aniceto, Natália Freitas, Alex A. Bender, Andreas Ghafourian, Taravat
author_facet	Aniceto, Natália Freitas, Alex A. Bender, Andreas Ghafourian, Taravat
author_sort	Aniceto, Natália
collection	PubMed
description	The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the data locally instead of handling it as a whole—the reliability-density neighbourhood (RDN). The main novelty aspect of this method is that it characterizes each single training instance according to the density of its neighbourhood in the training set, as well as its individual bias and precision. By scanning through the chemical space (by iteratively increasing the applicability domain area), it was observed that new test compounds are successively included into the applicability domain region in such a manner that strongly correlates to their predictive performance. This allows the mapping of local reliability across different locations in the training set space, and thus allows identifying regions where the model has low reliability. This method also showed matching profiles between two external sets, which is an indication that it performs robustly with new data. Another novel aspect in this technique is that it is paired with a specific feature selection algorithm. As a result, the impact of the feature set used was studied from which the top 20 features selected by ReliefF yielded the best results, as opposed to using the model’s features or the entire feature set as commonly done. As the third novel aspect, in this work we propose a new scoring function to help evaluate the quality of an applicability domain profile (i.e., the curve of accuracy vs the applicability domain measure in question). Overall, the RDN showed to be a promising method that can correctly sort new instances according to predictive performance. As a result, this technique can be received by an end-user as proof of concept for the performance of a QSAR model in new data, thus promoting the user’s trust on the QSAR output. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0182-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5395519
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-53955192017-05-05 A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood Aniceto, Natália Freitas, Alex A. Bender, Andreas Ghafourian, Taravat J Cheminform Methodology The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the data locally instead of handling it as a whole—the reliability-density neighbourhood (RDN). The main novelty aspect of this method is that it characterizes each single training instance according to the density of its neighbourhood in the training set, as well as its individual bias and precision. By scanning through the chemical space (by iteratively increasing the applicability domain area), it was observed that new test compounds are successively included into the applicability domain region in such a manner that strongly correlates to their predictive performance. This allows the mapping of local reliability across different locations in the training set space, and thus allows identifying regions where the model has low reliability. This method also showed matching profiles between two external sets, which is an indication that it performs robustly with new data. Another novel aspect in this technique is that it is paired with a specific feature selection algorithm. As a result, the impact of the feature set used was studied from which the top 20 features selected by ReliefF yielded the best results, as opposed to using the model’s features or the entire feature set as commonly done. As the third novel aspect, in this work we propose a new scoring function to help evaluate the quality of an applicability domain profile (i.e., the curve of accuracy vs the applicability domain measure in question). Overall, the RDN showed to be a promising method that can correctly sort new instances according to predictive performance. As a result, this technique can be received by an end-user as proof of concept for the performance of a QSAR model in new data, thus promoting the user’s trust on the QSAR output. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0182-y) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-12-03 /pmc/articles/PMC5395519/ http://dx.doi.org/10.1186/s13321-016-0182-y Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Aniceto, Natália Freitas, Alex A. Bender, Andreas Ghafourian, Taravat A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
title	A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
title_full	A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
title_fullStr	A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
title_full_unstemmed	A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
title_short	A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
title_sort	novel applicability domain technique for mapping predictive reliability across the chemical space of a qsar: reliability-density neighbourhood
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5395519/ http://dx.doi.org/10.1186/s13321-016-0182-y
work_keys_str_mv	AT anicetonatalia anovelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT freitasalexa anovelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT benderandreas anovelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT ghafouriantaravat anovelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT anicetonatalia novelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT freitasalexa novelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT benderandreas novelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood AT ghafouriantaravat novelapplicabilitydomaintechniqueformappingpredictivereliabilityacrossthechemicalspaceofaqsarreliabilitydensityneighbourhood

A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

Ejemplares similares