Cargando…
Practical Model Selection for Prospective Virtual Screening
[Image: see text] Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and do...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical
Society
2018
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6351977/ https://www.ncbi.nlm.nih.gov/pubmed/30500183 http://dx.doi.org/10.1021/acs.jcim.8b00363 |
_version_ | 1783390711532486656 |
---|---|
author | Liu, Shengchao Alnammi, Moayad Ericksen, Spencer S. Voter, Andrew F. Ananiev, Gene E. Keck, James L. Hoffmann, F. Michael Wildman, Scott A. Gitter, Anthony |
author_facet | Liu, Shengchao Alnammi, Moayad Ericksen, Spencer S. Voter, Andrew F. Ananiev, Gene E. Keck, James L. Hoffmann, F. Michael Wildman, Scott A. Gitter, Anthony |
author_sort | Liu, Shengchao |
collection | PubMed |
description | [Image: see text] Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein–protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well on public data sets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest. |
format | Online Article Text |
id | pubmed-6351977 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | American Chemical
Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-63519772019-01-31 Practical Model Selection for Prospective Virtual Screening Liu, Shengchao Alnammi, Moayad Ericksen, Spencer S. Voter, Andrew F. Ananiev, Gene E. Keck, James L. Hoffmann, F. Michael Wildman, Scott A. Gitter, Anthony J Chem Inf Model [Image: see text] Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the data set and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein–protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well on public data sets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest. American Chemical Society 2018-11-30 2019-01-28 /pmc/articles/PMC6351977/ /pubmed/30500183 http://dx.doi.org/10.1021/acs.jcim.8b00363 Text en Copyright © 2018 American Chemical Society This is an open access article published under a Creative Commons Attribution (CC-BY) License (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html) , which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited. |
spellingShingle | Liu, Shengchao Alnammi, Moayad Ericksen, Spencer S. Voter, Andrew F. Ananiev, Gene E. Keck, James L. Hoffmann, F. Michael Wildman, Scott A. Gitter, Anthony Practical Model Selection for Prospective Virtual Screening |
title | Practical Model Selection for Prospective Virtual
Screening |
title_full | Practical Model Selection for Prospective Virtual
Screening |
title_fullStr | Practical Model Selection for Prospective Virtual
Screening |
title_full_unstemmed | Practical Model Selection for Prospective Virtual
Screening |
title_short | Practical Model Selection for Prospective Virtual
Screening |
title_sort | practical model selection for prospective virtual
screening |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6351977/ https://www.ncbi.nlm.nih.gov/pubmed/30500183 http://dx.doi.org/10.1021/acs.jcim.8b00363 |
work_keys_str_mv | AT liushengchao practicalmodelselectionforprospectivevirtualscreening AT alnammimoayad practicalmodelselectionforprospectivevirtualscreening AT ericksenspencers practicalmodelselectionforprospectivevirtualscreening AT voterandrewf practicalmodelselectionforprospectivevirtualscreening AT ananievgenee practicalmodelselectionforprospectivevirtualscreening AT keckjamesl practicalmodelselectionforprospectivevirtualscreening AT hoffmannfmichael practicalmodelselectionforprospectivevirtualscreening AT wildmanscotta practicalmodelselectionforprospectivevirtualscreening AT gitteranthony practicalmodelselectionforprospectivevirtualscreening |