Cargando…
Performance of machine-learning scoring functions in structure-based virtual screening
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specif...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404222/ https://www.ncbi.nlm.nih.gov/pubmed/28440302 http://dx.doi.org/10.1038/srep46710 |
_version_ | 1783231555228925952 |
---|---|
author | Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel |
author_facet | Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel |
author_sort | Wójcikowski, Maciej |
collection | PubMed |
description | Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary). |
format | Online Article Text |
id | pubmed-5404222 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-54042222017-04-27 Performance of machine-learning scoring functions in structure-based virtual screening Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel Sci Rep Article Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary). Nature Publishing Group 2017-04-25 /pmc/articles/PMC5404222/ /pubmed/28440302 http://dx.doi.org/10.1038/srep46710 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel Performance of machine-learning scoring functions in structure-based virtual screening |
title | Performance of machine-learning scoring functions in structure-based virtual screening |
title_full | Performance of machine-learning scoring functions in structure-based virtual screening |
title_fullStr | Performance of machine-learning scoring functions in structure-based virtual screening |
title_full_unstemmed | Performance of machine-learning scoring functions in structure-based virtual screening |
title_short | Performance of machine-learning scoring functions in structure-based virtual screening |
title_sort | performance of machine-learning scoring functions in structure-based virtual screening |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404222/ https://www.ncbi.nlm.nih.gov/pubmed/28440302 http://dx.doi.org/10.1038/srep46710 |
work_keys_str_mv | AT wojcikowskimaciej performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening AT ballesterpedroj performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening AT siedleckipawel performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening |