Cargando…

Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models

BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model a...

Descripción completa

Detalles Bibliográficos
Autores principales: LeDonne, Norman C, Rissolo, Kevin, Bulgarelli, James, Tini, Leonard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045354/
https://www.ncbi.nlm.nih.gov/pubmed/21299868
http://dx.doi.org/10.1186/1758-2946-3-7
_version_ 1782198818634203136
author LeDonne, Norman C
Rissolo, Kevin
Bulgarelli, James
Tini, Leonard
author_facet LeDonne, Norman C
Rissolo, Kevin
Bulgarelli, James
Tini, Leonard
author_sort LeDonne, Norman C
collection PubMed
description BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. RESULTS: The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. CONCLUSION: The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models.
format Text
id pubmed-3045354
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30453542011-02-26 Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard J Cheminform Research Article BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. RESULTS: The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. CONCLUSION: The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models. BioMed Central 2011-02-07 /pmc/articles/PMC3045354/ /pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7 Text en Copyright ©2011 LeDonne et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
LeDonne, Norman C
Rissolo, Kevin
Bulgarelli, James
Tini, Leonard
Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_full Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_fullStr Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_full_unstemmed Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_short Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
title_sort use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045354/
https://www.ncbi.nlm.nih.gov/pubmed/21299868
http://dx.doi.org/10.1186/1758-2946-3-7
work_keys_str_mv AT ledonnenormanc useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels
AT rissolokevin useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels
AT bulgarellijames useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels
AT tinileonard useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels