Cargando…

Machine learning classification can reduce false positives in structure-based virtual screening

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show acti...

Descripción completa

Detalles Bibliográficos
Autores principales: Adeshina, Yusuf O., Deeds, Eric J., Karanicolas, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7414157/
https://www.ncbi.nlm.nih.gov/pubmed/32669436
http://dx.doi.org/10.1073/pnas.2000585117
_version_ 1783568921794707456
author Adeshina, Yusuf O.
Deeds, Eric J.
Karanicolas, John
author_facet Adeshina, Yusuf O.
Deeds, Eric J.
Karanicolas, John
author_sort Adeshina, Yusuf O.
collection PubMed
description With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC(50) better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC(50) 280 nM, corresponding to K(i) of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.
format Online
Article
Text
id pubmed-7414157
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-74141572020-08-21 Machine learning classification can reduce false positives in structure-based virtual screening Adeshina, Yusuf O. Deeds, Eric J. Karanicolas, John Proc Natl Acad Sci U S A Biological Sciences With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC(50) better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC(50) 280 nM, corresponding to K(i) of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts. National Academy of Sciences 2020-08-04 2020-07-15 /pmc/articles/PMC7414157/ /pubmed/32669436 http://dx.doi.org/10.1073/pnas.2000585117 Text en Copyright © 2020 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Adeshina, Yusuf O.
Deeds, Eric J.
Karanicolas, John
Machine learning classification can reduce false positives in structure-based virtual screening
title Machine learning classification can reduce false positives in structure-based virtual screening
title_full Machine learning classification can reduce false positives in structure-based virtual screening
title_fullStr Machine learning classification can reduce false positives in structure-based virtual screening
title_full_unstemmed Machine learning classification can reduce false positives in structure-based virtual screening
title_short Machine learning classification can reduce false positives in structure-based virtual screening
title_sort machine learning classification can reduce false positives in structure-based virtual screening
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7414157/
https://www.ncbi.nlm.nih.gov/pubmed/32669436
http://dx.doi.org/10.1073/pnas.2000585117
work_keys_str_mv AT adeshinayusufo machinelearningclassificationcanreducefalsepositivesinstructurebasedvirtualscreening
AT deedsericj machinelearningclassificationcanreducefalsepositivesinstructurebasedvirtualscreening
AT karanicolasjohn machinelearningclassificationcanreducefalsepositivesinstructurebasedvirtualscreening