Cargando…

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction

[Image: see text] In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeli...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodríguez-Pérez, Raquel, Vogt, Martin, Bajorath, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2017
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6045367/
https://www.ncbi.nlm.nih.gov/pubmed/30023518
http://dx.doi.org/10.1021/acsomega.7b01079
_version_ 1783339653849415680
author Rodríguez-Pérez, Raquel
Vogt, Martin
Bajorath, Jürgen
author_facet Rodríguez-Pérez, Raquel
Vogt, Martin
Bajorath, Jürgen
author_sort Rodríguez-Pérez, Raquel
collection PubMed
description [Image: see text] In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure–activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and properties) are generally preferred descriptors. Herein, we have compared SVM and SVR calculations for the same compound data sets to evaluate which features are responsible for predictions. On the basis of systematic feature weight analysis, rather surprising results were obtained. Fingerprint features were frequently identified that contributed differently to the corresponding SVM and SVR models. The overlap between feature sets determining the predictive performance of SVM and SVR was only very small. Furthermore, features were identified that had opposite effects on SVM and SVR predictions. Feature weight analysis in combination with feature mapping made it also possible to interpret individual predictions, thus balancing the black box character of SVM/SVR modeling.
format Online
Article
Text
id pubmed-6045367
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-60453672018-07-16 Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction Rodríguez-Pérez, Raquel Vogt, Martin Bajorath, Jürgen ACS Omega [Image: see text] In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure–activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and properties) are generally preferred descriptors. Herein, we have compared SVM and SVR calculations for the same compound data sets to evaluate which features are responsible for predictions. On the basis of systematic feature weight analysis, rather surprising results were obtained. Fingerprint features were frequently identified that contributed differently to the corresponding SVM and SVR models. The overlap between feature sets determining the predictive performance of SVM and SVR was only very small. Furthermore, features were identified that had opposite effects on SVM and SVR predictions. Feature weight analysis in combination with feature mapping made it also possible to interpret individual predictions, thus balancing the black box character of SVM/SVR modeling. American Chemical Society 2017-10-04 /pmc/articles/PMC6045367/ /pubmed/30023518 http://dx.doi.org/10.1021/acsomega.7b01079 Text en Copyright © 2017 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Rodríguez-Pérez, Raquel
Vogt, Martin
Bajorath, Jürgen
Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction
title Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction
title_full Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction
title_fullStr Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction
title_full_unstemmed Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction
title_short Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction
title_sort support vector machine classification and regression prioritize different structural features for binary compound activity and potency value prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6045367/
https://www.ncbi.nlm.nih.gov/pubmed/30023518
http://dx.doi.org/10.1021/acsomega.7b01079
work_keys_str_mv AT rodriguezperezraquel supportvectormachineclassificationandregressionprioritizedifferentstructuralfeaturesforbinarycompoundactivityandpotencyvalueprediction
AT vogtmartin supportvectormachineclassificationandregressionprioritizedifferentstructuralfeaturesforbinarycompoundactivityandpotencyvalueprediction
AT bajorathjurgen supportvectormachineclassificationandregressionprioritizedifferentstructuralfeaturesforbinarycompoundactivityandpotencyvalueprediction