Cargando…

A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces

The modulation of protein–protein interactions (PPIs) by small chemical compounds is challenging. PPIs play a critical role in most cellular processes and are involved in numerous disease pathways. As such, novel strategies that assist the design of PPI inhibitors are of major importance. We previou...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Natesh, Villoutreix, Bruno O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9694378/
https://www.ncbi.nlm.nih.gov/pubmed/36430841
http://dx.doi.org/10.3390/ijms232214364
_version_ 1784837784870060032
author Singh, Natesh
Villoutreix, Bruno O.
author_facet Singh, Natesh
Villoutreix, Bruno O.
author_sort Singh, Natesh
collection PubMed
description The modulation of protein–protein interactions (PPIs) by small chemical compounds is challenging. PPIs play a critical role in most cellular processes and are involved in numerous disease pathways. As such, novel strategies that assist the design of PPI inhibitors are of major importance. We previously reported that the knowledge-based DLIGAND2 scoring tool was the best-rescoring function for improving receptor-based virtual screening (VS) performed with the Surflex docking engine applied to several PPI targets with experimentally known active and inactive compounds. Here, we extend our investigation by assessing the vs. potential of other types of scoring functions with an emphasis on docking-pose derived solvent accessible surface area (SASA) descriptors, with or without the use of machine learning (ML) classifiers. First, we explored rescoring strategies of Surflex-generated docking poses with five GOLD scoring functions (GoldScore, ChemScore, ASP, ChemPLP, ChemScore with Receptor Depth Scaling) and with consensus scoring. The top-ranked poses were post-processed to derive a set of protein and ligand SASA descriptors in the bound and unbound states, which were combined to derive descriptors of the docked protein-ligand complexes. Further, eight ML models (tree, bagged forest, random forest, Bayesian, support vector machine, logistic regression, neural network, and neural network with bagging) were trained using the derivatized SASA descriptors and validated on test sets. The results show that many SASA descriptors are better than Surflex and GOLD scoring functions in terms of overall performance and early recovery success on the used dataset. The ML models were superior to all scoring functions and rescoring approaches for most targets yielding up to a seven-fold increase in enrichment factors at 1% of the screened collections. In particular, the neural networks and random forest-based ML emerged as the best techniques for this PPI dataset, making them robust and attractive vs. tools for hit-finding efforts. The presented results suggest that exploring further docking-pose derived SASA descriptors could be valuable for structure-based virtual screening projects, and in the present case, to assist the rational design of small-molecule PPI inhibitors.
format Online
Article
Text
id pubmed-9694378
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96943782022-11-26 A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces Singh, Natesh Villoutreix, Bruno O. Int J Mol Sci Article The modulation of protein–protein interactions (PPIs) by small chemical compounds is challenging. PPIs play a critical role in most cellular processes and are involved in numerous disease pathways. As such, novel strategies that assist the design of PPI inhibitors are of major importance. We previously reported that the knowledge-based DLIGAND2 scoring tool was the best-rescoring function for improving receptor-based virtual screening (VS) performed with the Surflex docking engine applied to several PPI targets with experimentally known active and inactive compounds. Here, we extend our investigation by assessing the vs. potential of other types of scoring functions with an emphasis on docking-pose derived solvent accessible surface area (SASA) descriptors, with or without the use of machine learning (ML) classifiers. First, we explored rescoring strategies of Surflex-generated docking poses with five GOLD scoring functions (GoldScore, ChemScore, ASP, ChemPLP, ChemScore with Receptor Depth Scaling) and with consensus scoring. The top-ranked poses were post-processed to derive a set of protein and ligand SASA descriptors in the bound and unbound states, which were combined to derive descriptors of the docked protein-ligand complexes. Further, eight ML models (tree, bagged forest, random forest, Bayesian, support vector machine, logistic regression, neural network, and neural network with bagging) were trained using the derivatized SASA descriptors and validated on test sets. The results show that many SASA descriptors are better than Surflex and GOLD scoring functions in terms of overall performance and early recovery success on the used dataset. The ML models were superior to all scoring functions and rescoring approaches for most targets yielding up to a seven-fold increase in enrichment factors at 1% of the screened collections. In particular, the neural networks and random forest-based ML emerged as the best techniques for this PPI dataset, making them robust and attractive vs. tools for hit-finding efforts. The presented results suggest that exploring further docking-pose derived SASA descriptors could be valuable for structure-based virtual screening projects, and in the present case, to assist the rational design of small-molecule PPI inhibitors. MDPI 2022-11-18 /pmc/articles/PMC9694378/ /pubmed/36430841 http://dx.doi.org/10.3390/ijms232214364 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Singh, Natesh
Villoutreix, Bruno O.
A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces
title A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces
title_full A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces
title_fullStr A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces
title_full_unstemmed A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces
title_short A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein–Protein Interfaces
title_sort hybrid docking and machine learning approach to enhance the performance of virtual screening carried out on protein–protein interfaces
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9694378/
https://www.ncbi.nlm.nih.gov/pubmed/36430841
http://dx.doi.org/10.3390/ijms232214364
work_keys_str_mv AT singhnatesh ahybriddockingandmachinelearningapproachtoenhancetheperformanceofvirtualscreeningcarriedoutonproteinproteininterfaces
AT villoutreixbrunoo ahybriddockingandmachinelearningapproachtoenhancetheperformanceofvirtualscreeningcarriedoutonproteinproteininterfaces
AT singhnatesh hybriddockingandmachinelearningapproachtoenhancetheperformanceofvirtualscreeningcarriedoutonproteinproteininterfaces
AT villoutreixbrunoo hybriddockingandmachinelearningapproachtoenhancetheperformanceofvirtualscreeningcarriedoutonproteinproteininterfaces