Cargando…

Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance

BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular vir...

Descripción completa

Detalles Bibliográficos
Autores principales: Chaput, Ludovic, Martinez-Sanz, Juan, Saettel, Nicolas, Mouawad, Liliane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5066283/
https://www.ncbi.nlm.nih.gov/pubmed/27803745
http://dx.doi.org/10.1186/s13321-016-0167-x
_version_ 1782460459905974272
author Chaput, Ludovic
Martinez-Sanz, Juan
Saettel, Nicolas
Mouawad, Liliane
author_facet Chaput, Ludovic
Martinez-Sanz, Juan
Saettel, Nicolas
Mouawad, Liliane
author_sort Chaput, Ludovic
collection PubMed
description BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking. Then, a relationship between these program performances and the properties of the targets or the small molecules was investigated. RESULTS: The comparison was based on two metrics, with three different parameters each. The BEDROC scores with α = 80.5, indicated that, on the overall database, Glide succeeded (score > 0.5) for 30 targets, Gold for 27, FlexX for 14 and Surflex for 11. The performance did not depend on the hydrophobicity nor the openness of the protein cavities, neither on the families to which the proteins belong. However, despite the care in the construction of the DUD-E database, the small differences that remain between the actives and the decoys likely explain the successes of Gold, Surflex and FlexX. Moreover, the similarity between the actives of a target and its crystal structure ligand seems to be at the basis of the good performance of Glide. When all targets with significant biases are removed from the benchmarking, a subset of 47 targets remains, for which Glide succeeded for only 5 targets, Gold for 4 and FlexX and Surflex for 2. CONCLUSION: The performance dramatic drop of all four programs when the biases are removed shows that we should beware of virtual screening benchmarks, because good performances may be due to wrong reasons. Therefore, benchmarking would hardly provide guidelines for virtual screening experiments, despite the tendency that is maintained, i.e., Glide and Gold display better performance than FlexX and Surflex. We recommend to always use several programs and combine their results. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0167-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5066283
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-50662832016-11-01 Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance Chaput, Ludovic Martinez-Sanz, Juan Saettel, Nicolas Mouawad, Liliane J Cheminform Research Article BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking. Then, a relationship between these program performances and the properties of the targets or the small molecules was investigated. RESULTS: The comparison was based on two metrics, with three different parameters each. The BEDROC scores with α = 80.5, indicated that, on the overall database, Glide succeeded (score > 0.5) for 30 targets, Gold for 27, FlexX for 14 and Surflex for 11. The performance did not depend on the hydrophobicity nor the openness of the protein cavities, neither on the families to which the proteins belong. However, despite the care in the construction of the DUD-E database, the small differences that remain between the actives and the decoys likely explain the successes of Gold, Surflex and FlexX. Moreover, the similarity between the actives of a target and its crystal structure ligand seems to be at the basis of the good performance of Glide. When all targets with significant biases are removed from the benchmarking, a subset of 47 targets remains, for which Glide succeeded for only 5 targets, Gold for 4 and FlexX and Surflex for 2. CONCLUSION: The performance dramatic drop of all four programs when the biases are removed shows that we should beware of virtual screening benchmarks, because good performances may be due to wrong reasons. Therefore, benchmarking would hardly provide guidelines for virtual screening experiments, despite the tendency that is maintained, i.e., Glide and Gold display better performance than FlexX and Surflex. We recommend to always use several programs and combine their results. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0167-x) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-10-17 /pmc/articles/PMC5066283/ /pubmed/27803745 http://dx.doi.org/10.1186/s13321-016-0167-x Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chaput, Ludovic
Martinez-Sanz, Juan
Saettel, Nicolas
Mouawad, Liliane
Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
title Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
title_full Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
title_fullStr Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
title_full_unstemmed Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
title_short Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
title_sort benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5066283/
https://www.ncbi.nlm.nih.gov/pubmed/27803745
http://dx.doi.org/10.1186/s13321-016-0167-x
work_keys_str_mv AT chaputludovic benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance
AT martinezsanzjuan benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance
AT saettelnicolas benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance
AT mouawadliliane benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance