Cargando…

Practical Model Selection for Prospective Virtual Screening

[Image: see text] Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and do...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Shengchao, Alnammi, Moayad, Ericksen, Spencer S., Voter, Andrew F., Ananiev, Gene E., Keck, James L., Hoffmann, F. Michael, Wildman, Scott A., Gitter, Anthony
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2018
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6351977/
https://www.ncbi.nlm.nih.gov/pubmed/30500183
http://dx.doi.org/10.1021/acs.jcim.8b00363
_version_ 1783390711532486656
author Liu, Shengchao
Alnammi, Moayad
Ericksen, Spencer S.
Voter, Andrew F.
Ananiev, Gene E.
Keck, James L.
Hoffmann, F. Michael
Wildman, Scott A.
Gitter, Anthony
author_facet Liu, Shengchao
Alnammi, Moayad
Ericksen, Spencer S.
Voter, Andrew F.
Ananiev, Gene E.
Keck, James L.
Hoffmann, F. Michael
Wildman, Scott A.
Gitter, Anthony
author_sort Liu, Shengchao
collection PubMed
description [Image: see text] Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein–protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well on public data sets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest.
format Online
Article
Text
id pubmed-6351977
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-63519772019-01-31 Practical Model Selection for Prospective Virtual Screening Liu, Shengchao Alnammi, Moayad Ericksen, Spencer S. Voter, Andrew F. Ananiev, Gene E. Keck, James L. Hoffmann, F. Michael Wildman, Scott A. Gitter, Anthony J Chem Inf Model [Image: see text] Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein–protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well on public data sets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest. American Chemical Society 2018-11-30 2019-01-28 /pmc/articles/PMC6351977/ /pubmed/30500183 http://dx.doi.org/10.1021/acs.jcim.8b00363 Text en Copyright © 2018 American Chemical Society This is an open access article published under a Creative Commons Attribution (CC-BY) License (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html) , which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited.
spellingShingle Liu, Shengchao
Alnammi, Moayad
Ericksen, Spencer S.
Voter, Andrew F.
Ananiev, Gene E.
Keck, James L.
Hoffmann, F. Michael
Wildman, Scott A.
Gitter, Anthony
Practical Model Selection for Prospective Virtual Screening
title Practical Model Selection for Prospective Virtual Screening
title_full Practical Model Selection for Prospective Virtual Screening
title_fullStr Practical Model Selection for Prospective Virtual Screening
title_full_unstemmed Practical Model Selection for Prospective Virtual Screening
title_short Practical Model Selection for Prospective Virtual Screening
title_sort practical model selection for prospective virtual screening
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6351977/
https://www.ncbi.nlm.nih.gov/pubmed/30500183
http://dx.doi.org/10.1021/acs.jcim.8b00363
work_keys_str_mv AT liushengchao practicalmodelselectionforprospectivevirtualscreening
AT alnammimoayad practicalmodelselectionforprospectivevirtualscreening
AT ericksenspencers practicalmodelselectionforprospectivevirtualscreening
AT voterandrewf practicalmodelselectionforprospectivevirtualscreening
AT ananievgenee practicalmodelselectionforprospectivevirtualscreening
AT keckjamesl practicalmodelselectionforprospectivevirtualscreening
AT hoffmannfmichael practicalmodelselectionforprospectivevirtualscreening
AT wildmanscotta practicalmodelselectionforprospectivevirtualscreening
AT gitteranthony practicalmodelselectionforprospectivevirtualscreening