Cargando…

AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating s...

Descripción completa

Detalles Bibliográficos
Autores principales: Clyde, Austin, Liu, Xuefeng, Brettin, Thomas, Yoo, Hyunseung, Partin, Alexander, Babuji, Yadu, Blaiszik, Ben, Mohd-Yusof, Jamaludin, Merzky, Andre, Turilli, Matteo, Jha, Shantenu, Ramanathan, Arvind, Stevens, Rick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9901402/
https://www.ncbi.nlm.nih.gov/pubmed/36747041
http://dx.doi.org/10.1038/s41598-023-28785-9
_version_ 1784883023590719488
author Clyde, Austin
Liu, Xuefeng
Brettin, Thomas
Yoo, Hyunseung
Partin, Alexander
Babuji, Yadu
Blaiszik, Ben
Mohd-Yusof, Jamaludin
Merzky, Andre
Turilli, Matteo
Jha, Shantenu
Ramanathan, Arvind
Stevens, Rick
author_facet Clyde, Austin
Liu, Xuefeng
Brettin, Thomas
Yoo, Hyunseung
Partin, Alexander
Babuji, Yadu
Blaiszik, Ben
Mohd-Yusof, Jamaludin
Merzky, Andre
Turilli, Matteo
Jha, Shantenu
Ramanathan, Arvind
Stevens, Rick
author_sort Clyde, Austin
collection PubMed
description Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million “in-stock” molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.
format Online
Article
Text
id pubmed-9901402
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99014022023-02-07 AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection Clyde, Austin Liu, Xuefeng Brettin, Thomas Yoo, Hyunseung Partin, Alexander Babuji, Yadu Blaiszik, Ben Mohd-Yusof, Jamaludin Merzky, Andre Turilli, Matteo Jha, Shantenu Ramanathan, Arvind Stevens, Rick Sci Rep Article Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million “in-stock” molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries. Nature Publishing Group UK 2023-02-06 /pmc/articles/PMC9901402/ /pubmed/36747041 http://dx.doi.org/10.1038/s41598-023-28785-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Clyde, Austin
Liu, Xuefeng
Brettin, Thomas
Yoo, Hyunseung
Partin, Alexander
Babuji, Yadu
Blaiszik, Ben
Mohd-Yusof, Jamaludin
Merzky, Andre
Turilli, Matteo
Jha, Shantenu
Ramanathan, Arvind
Stevens, Rick
AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_full AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_fullStr AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_full_unstemmed AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_short AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_sort ai-accelerated protein-ligand docking for sars-cov-2 is 100-fold faster with no significant change in detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9901402/
https://www.ncbi.nlm.nih.gov/pubmed/36747041
http://dx.doi.org/10.1038/s41598-023-28785-9
work_keys_str_mv AT clydeaustin aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT liuxuefeng aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT brettinthomas aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT yoohyunseung aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT partinalexander aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT babujiyadu aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT blaiszikben aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT mohdyusofjamaludin aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT merzkyandre aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT turillimatteo aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT jhashantenu aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT ramanathanarvind aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection
AT stevensrick aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection