Cargando…

Performance of machine-learning scoring functions in structure-based virtual screening

Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specif...

Descripción completa

Detalles Bibliográficos
Autores principales: Wójcikowski, Maciej, Ballester, Pedro J., Siedlecki, Pawel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404222/
https://www.ncbi.nlm.nih.gov/pubmed/28440302
http://dx.doi.org/10.1038/srep46710
_version_ 1783231555228925952
author Wójcikowski, Maciej
Ballester, Pedro J.
Siedlecki, Pawel
author_facet Wójcikowski, Maciej
Ballester, Pedro J.
Siedlecki, Pawel
author_sort Wójcikowski, Maciej
collection PubMed
description Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
format Online
Article
Text
id pubmed-5404222
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-54042222017-04-27 Performance of machine-learning scoring functions in structure-based virtual screening Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel Sci Rep Article Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary). Nature Publishing Group 2017-04-25 /pmc/articles/PMC5404222/ /pubmed/28440302 http://dx.doi.org/10.1038/srep46710 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Wójcikowski, Maciej
Ballester, Pedro J.
Siedlecki, Pawel
Performance of machine-learning scoring functions in structure-based virtual screening
title Performance of machine-learning scoring functions in structure-based virtual screening
title_full Performance of machine-learning scoring functions in structure-based virtual screening
title_fullStr Performance of machine-learning scoring functions in structure-based virtual screening
title_full_unstemmed Performance of machine-learning scoring functions in structure-based virtual screening
title_short Performance of machine-learning scoring functions in structure-based virtual screening
title_sort performance of machine-learning scoring functions in structure-based virtual screening
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404222/
https://www.ncbi.nlm.nih.gov/pubmed/28440302
http://dx.doi.org/10.1038/srep46710
work_keys_str_mv AT wojcikowskimaciej performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening
AT ballesterpedroj performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening
AT siedleckipawel performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening