Cargando…
Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models
BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model a...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045354/ https://www.ncbi.nlm.nih.gov/pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7 |
_version_ | 1782198818634203136 |
---|---|
author | LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard |
author_facet | LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard |
author_sort | LeDonne, Norman C |
collection | PubMed |
description | BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. RESULTS: The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. CONCLUSION: The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models. |
format | Text |
id | pubmed-3045354 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30453542011-02-26 Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard J Cheminform Research Article BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. RESULTS: The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. CONCLUSION: The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models. BioMed Central 2011-02-07 /pmc/articles/PMC3045354/ /pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7 Text en Copyright ©2011 LeDonne et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article LeDonne, Norman C Rissolo, Kevin Bulgarelli, James Tini, Leonard Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
title | Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
title_full | Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
title_fullStr | Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
title_full_unstemmed | Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
title_short | Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
title_sort | use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045354/ https://www.ncbi.nlm.nih.gov/pubmed/21299868 http://dx.doi.org/10.1186/1758-2946-3-7 |
work_keys_str_mv | AT ledonnenormanc useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels AT rissolokevin useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels AT bulgarellijames useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels AT tinileonard useofstructureactivitylandscapeindexcurvesandcurveintegralstoevaluatetheperformanceofmultiplemachinelearningpredictionmodels |