Cargando…

Evaluation of QSAR Equations for Virtual Screening

Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Spiegel, Jacob, Senderowitz, Hanoch
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7672587/
https://www.ncbi.nlm.nih.gov/pubmed/33105703
http://dx.doi.org/10.3390/ijms21217828
_version_ 1783611165435232256
author Spiegel, Jacob
Senderowitz, Hanoch
author_facet Spiegel, Jacob
Senderowitz, Hanoch
author_sort Spiegel, Jacob
collection PubMed
description Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, [Formula: see text] and [Formula: see text]. Similar metrics, calculated on an external set of data (e.g., [Formula: see text]), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -” ignorant”. In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by “classical” metrics, e.g., [Formula: see text] and [Formula: see text] and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable [Formula: see text] and/or [Formula: see text] values were unable to pick a single active compound from within the pool whereas in other cases, models with poor [Formula: see text] and/or [Formula: see text] values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.
format Online
Article
Text
id pubmed-7672587
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-76725872020-11-19 Evaluation of QSAR Equations for Virtual Screening Spiegel, Jacob Senderowitz, Hanoch Int J Mol Sci Article Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, [Formula: see text] and [Formula: see text]. Similar metrics, calculated on an external set of data (e.g., [Formula: see text]), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -” ignorant”. In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by “classical” metrics, e.g., [Formula: see text] and [Formula: see text] and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable [Formula: see text] and/or [Formula: see text] values were unable to pick a single active compound from within the pool whereas in other cases, models with poor [Formula: see text] and/or [Formula: see text] values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening. MDPI 2020-10-22 /pmc/articles/PMC7672587/ /pubmed/33105703 http://dx.doi.org/10.3390/ijms21217828 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Spiegel, Jacob
Senderowitz, Hanoch
Evaluation of QSAR Equations for Virtual Screening
title Evaluation of QSAR Equations for Virtual Screening
title_full Evaluation of QSAR Equations for Virtual Screening
title_fullStr Evaluation of QSAR Equations for Virtual Screening
title_full_unstemmed Evaluation of QSAR Equations for Virtual Screening
title_short Evaluation of QSAR Equations for Virtual Screening
title_sort evaluation of qsar equations for virtual screening
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7672587/
https://www.ncbi.nlm.nih.gov/pubmed/33105703
http://dx.doi.org/10.3390/ijms21217828
work_keys_str_mv AT spiegeljacob evaluationofqsarequationsforvirtualscreening
AT senderowitzhanoch evaluationofqsarequationsforvirtualscreening