Cargando…

Performance of machine-learning scoring functions in structure-based virtual screening

Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specif...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wójcikowski, Maciej, Ballester, Pedro J., Siedlecki, Pawel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group 2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404222/ https://www.ncbi.nlm.nih.gov/pubmed/28440302 http://dx.doi.org/10.1038/srep46710

_version_	1783231555228925952
author	Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel
author_facet	Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel
author_sort	Wójcikowski, Maciej
collection	PubMed
description	Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
format	Online Article Text
id	pubmed-5404222
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Nature Publishing Group
record_format	MEDLINE/PubMed
spelling	pubmed-54042222017-04-27 Performance of machine-learning scoring functions in structure-based virtual screening Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel Sci Rep Article Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary). Nature Publishing Group 2017-04-25 /pmc/articles/PMC5404222/ /pubmed/28440302 http://dx.doi.org/10.1038/srep46710 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle	Article Wójcikowski, Maciej Ballester, Pedro J. Siedlecki, Pawel Performance of machine-learning scoring functions in structure-based virtual screening
title	Performance of machine-learning scoring functions in structure-based virtual screening
title_full	Performance of machine-learning scoring functions in structure-based virtual screening
title_fullStr	Performance of machine-learning scoring functions in structure-based virtual screening
title_full_unstemmed	Performance of machine-learning scoring functions in structure-based virtual screening
title_short	Performance of machine-learning scoring functions in structure-based virtual screening
title_sort	performance of machine-learning scoring functions in structure-based virtual screening
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404222/ https://www.ncbi.nlm.nih.gov/pubmed/28440302 http://dx.doi.org/10.1038/srep46710
work_keys_str_mv	AT wojcikowskimaciej performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening AT ballesterpedroj performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening AT siedleckipawel performanceofmachinelearningscoringfunctionsinstructurebasedvirtualscreening

Performance of machine-learning scoring functions in structure-based virtual screening

Ejemplares similares