Cargando…

Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis

Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probab...

Descripción completa

Detalles Bibliográficos
Autores principales: Balfer, Jenny, Bajorath, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350943/
https://www.ncbi.nlm.nih.gov/pubmed/25742011
http://dx.doi.org/10.1371/journal.pone.0119301
_version_ 1782360257199079424
author Balfer, Jenny
Bajorath, Jürgen
author_facet Balfer, Jenny
Bajorath, Jürgen
author_sort Balfer, Jenny
collection PubMed
description Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.
format Online
Article
Text
id pubmed-4350943
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43509432015-03-17 Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis Balfer, Jenny Bajorath, Jürgen PLoS One Research Article Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets. Public Library of Science 2015-03-05 /pmc/articles/PMC4350943/ /pubmed/25742011 http://dx.doi.org/10.1371/journal.pone.0119301 Text en © 2015 Balfer, Bajorath http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Balfer, Jenny
Bajorath, Jürgen
Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
title Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
title_full Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
title_fullStr Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
title_full_unstemmed Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
title_short Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
title_sort systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350943/
https://www.ncbi.nlm.nih.gov/pubmed/25742011
http://dx.doi.org/10.1371/journal.pone.0119301
work_keys_str_mv AT balferjenny systematicartifactsinsupportvectorregressionbasedcompoundpotencypredictionrevealedbystatisticalandactivitylandscapeanalysis
AT bajorathjurgen systematicartifactsinsupportvectorregressionbasedcompoundpotencypredictionrevealedbystatisticalandactivitylandscapeanalysis