Cargando…

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries

[Image: see text] The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute...

Descripción completa

Detalles Bibliográficos
Autores principales: Sivula, Toni, Yetukuri, Laxman, Kalliokoski, Tuomo, Käsnänen, Heikki, Poso, Antti, Pöhner, Ina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10523430/
https://www.ncbi.nlm.nih.gov/pubmed/37655823
http://dx.doi.org/10.1021/acs.jcim.3c01239
_version_ 1785110564578525184
author Sivula, Toni
Yetukuri, Laxman
Kalliokoski, Tuomo
Käsnänen, Heikki
Poso, Antti
Pöhner, Ina
author_facet Sivula, Toni
Yetukuri, Laxman
Kalliokoski, Tuomo
Käsnänen, Heikki
Poso, Antti
Pöhner, Ina
author_sort Sivula, Toni
collection PubMed
description [Image: see text] The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide high-throughput virtual screening protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time. In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.
format Online
Article
Text
id pubmed-10523430
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-105234302023-09-28 Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries Sivula, Toni Yetukuri, Laxman Kalliokoski, Tuomo Käsnänen, Heikki Poso, Antti Pöhner, Ina J Chem Inf Model [Image: see text] The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide high-throughput virtual screening protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time. In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and benchmark and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns. American Chemical Society 2023-09-01 /pmc/articles/PMC10523430/ /pubmed/37655823 http://dx.doi.org/10.1021/acs.jcim.3c01239 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Sivula, Toni
Yetukuri, Laxman
Kalliokoski, Tuomo
Käsnänen, Heikki
Poso, Antti
Pöhner, Ina
Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
title Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
title_full Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
title_fullStr Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
title_full_unstemmed Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
title_short Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
title_sort machine learning-boosted docking enables the efficient structure-based virtual screening of giga-scale enumerated chemical libraries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10523430/
https://www.ncbi.nlm.nih.gov/pubmed/37655823
http://dx.doi.org/10.1021/acs.jcim.3c01239
work_keys_str_mv AT sivulatoni machinelearningboosteddockingenablestheefficientstructurebasedvirtualscreeningofgigascaleenumeratedchemicallibraries
AT yetukurilaxman machinelearningboosteddockingenablestheefficientstructurebasedvirtualscreeningofgigascaleenumeratedchemicallibraries
AT kalliokoskituomo machinelearningboosteddockingenablestheefficientstructurebasedvirtualscreeningofgigascaleenumeratedchemicallibraries
AT kasnanenheikki machinelearningboosteddockingenablestheefficientstructurebasedvirtualscreeningofgigascaleenumeratedchemicallibraries
AT posoantti machinelearningboosteddockingenablestheefficientstructurebasedvirtualscreeningofgigascaleenumeratedchemicallibraries
AT pohnerina machinelearningboosteddockingenablestheefficientstructurebasedvirtualscreeningofgigascaleenumeratedchemicallibraries