Cargando…

Virtual Screening with Gnina 1.0

Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster ti...

Descripción completa

Detalles Bibliográficos
Autores principales: Sunseri, Jocelyn, Koes, David Ryan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659095/
https://www.ncbi.nlm.nih.gov/pubmed/34885952
http://dx.doi.org/10.3390/molecules26237369
_version_ 1784612884565721088
author Sunseri, Jocelyn
Koes, David Ryan
author_facet Sunseri, Jocelyn
Koes, David Ryan
author_sort Sunseri, Jocelyn
collection PubMed
description Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.
format Online
Article
Text
id pubmed-8659095
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86590952021-12-10 Virtual Screening with Gnina 1.0 Sunseri, Jocelyn Koes, David Ryan Molecules Article Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions. MDPI 2021-12-04 /pmc/articles/PMC8659095/ /pubmed/34885952 http://dx.doi.org/10.3390/molecules26237369 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sunseri, Jocelyn
Koes, David Ryan
Virtual Screening with Gnina 1.0
title Virtual Screening with Gnina 1.0
title_full Virtual Screening with Gnina 1.0
title_fullStr Virtual Screening with Gnina 1.0
title_full_unstemmed Virtual Screening with Gnina 1.0
title_short Virtual Screening with Gnina 1.0
title_sort virtual screening with gnina 1.0
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659095/
https://www.ncbi.nlm.nih.gov/pubmed/34885952
http://dx.doi.org/10.3390/molecules26237369
work_keys_str_mv AT sunserijocelyn virtualscreeningwithgnina10
AT koesdavidryan virtualscreeningwithgnina10