Cargando…

Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models

BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model a...

Descripción completa

Detalles Bibliográficos
Autores principales:	LeDonne, Norman C, Rissolo, Kevin, Bulgarelli, James, Tini, Leonard
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045354/ https://www.ncbi.nlm.nih.gov/pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7

_version_	1782198818634203136
author	LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard
author_facet	LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard
author_sort	LeDonne, Norman C
collection	PubMed
description	BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. RESULTS: The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. CONCLUSION: The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models.
format	Text
id	pubmed-3045354
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30453542011-02-26 Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard J Cheminform Research Article BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. RESULTS: The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. CONCLUSION: The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models. BioMed Central 2011-02-07 /pmc/articles/PMC3045354/ /pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7 Text en Copyright ©2011 LeDonne et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title	Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_full	Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_fullStr	Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_full_unstemmed	Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_short	Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_sort	use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045354/ https://www.ncbi.nlm.nih.gov/pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7
work_keys_str_mv	AT ledonnenormanc useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels AT rissolokevin useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels AT bulgarellijames useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels AT tinileonard useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels

Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models

Ejemplares similares