Cargando…

Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery

In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has gr...

Descripción completa

Detalles Bibliográficos
Autores principales: Schaller, David, Christ, Clara D., Chodera, John D., Volkamer, Andrea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515787/
https://www.ncbi.nlm.nih.gov/pubmed/37745489
http://dx.doi.org/10.1101/2023.09.11.557138
_version_ 1785109019678998528
author Schaller, David
Christ, Clara D.
Chodera, John D.
Volkamer, Andrea
author_facet Schaller, David
Christ, Clara D.
Chodera, John D.
Volkamer, Andrea
author_sort Schaller, David
collection PubMed
description In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand—utilizing shape overlap with or without maximum common substructure matching—are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies—which utilize the OpenEye Toolkit—were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families.
format Online
Article
Text
id pubmed-10515787
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105157872023-09-23 Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery Schaller, David Christ, Clara D. Chodera, John D. Volkamer, Andrea bioRxiv Article In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand—utilizing shape overlap with or without maximum common substructure matching—are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies—which utilize the OpenEye Toolkit—were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families. Cold Spring Harbor Laboratory 2023-09-14 /pmc/articles/PMC10515787/ /pubmed/37745489 http://dx.doi.org/10.1101/2023.09.11.557138 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Schaller, David
Christ, Clara D.
Chodera, John D.
Volkamer, Andrea
Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery
title Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery
title_full Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery
title_fullStr Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery
title_full_unstemmed Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery
title_short Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery
title_sort benchmarking cross-docking strategies for structure-informed machine learning in kinase drug discovery
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515787/
https://www.ncbi.nlm.nih.gov/pubmed/37745489
http://dx.doi.org/10.1101/2023.09.11.557138
work_keys_str_mv AT schallerdavid benchmarkingcrossdockingstrategiesforstructureinformedmachinelearninginkinasedrugdiscovery
AT christclarad benchmarkingcrossdockingstrategiesforstructureinformedmachinelearninginkinasedrugdiscovery
AT choderajohnd benchmarkingcrossdockingstrategiesforstructureinformedmachinelearninginkinasedrugdiscovery
AT volkamerandrea benchmarkingcrossdockingstrategiesforstructureinformedmachinelearninginkinasedrugdiscovery