Cargando…
Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance
BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular vir...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5066283/ https://www.ncbi.nlm.nih.gov/pubmed/27803745 http://dx.doi.org/10.1186/s13321-016-0167-x |
_version_ | 1782460459905974272 |
---|---|
author | Chaput, Ludovic Martinez-Sanz, Juan Saettel, Nicolas Mouawad, Liliane |
author_facet | Chaput, Ludovic Martinez-Sanz, Juan Saettel, Nicolas Mouawad, Liliane |
author_sort | Chaput, Ludovic |
collection | PubMed |
description | BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking. Then, a relationship between these program performances and the properties of the targets or the small molecules was investigated. RESULTS: The comparison was based on two metrics, with three different parameters each. The BEDROC scores with α = 80.5, indicated that, on the overall database, Glide succeeded (score > 0.5) for 30 targets, Gold for 27, FlexX for 14 and Surflex for 11. The performance did not depend on the hydrophobicity nor the openness of the protein cavities, neither on the families to which the proteins belong. However, despite the care in the construction of the DUD-E database, the small differences that remain between the actives and the decoys likely explain the successes of Gold, Surflex and FlexX. Moreover, the similarity between the actives of a target and its crystal structure ligand seems to be at the basis of the good performance of Glide. When all targets with significant biases are removed from the benchmarking, a subset of 47 targets remains, for which Glide succeeded for only 5 targets, Gold for 4 and FlexX and Surflex for 2. CONCLUSION: The performance dramatic drop of all four programs when the biases are removed shows that we should beware of virtual screening benchmarks, because good performances may be due to wrong reasons. Therefore, benchmarking would hardly provide guidelines for virtual screening experiments, despite the tendency that is maintained, i.e., Glide and Gold display better performance than FlexX and Surflex. We recommend to always use several programs and combine their results. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0167-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5066283 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-50662832016-11-01 Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance Chaput, Ludovic Martinez-Sanz, Juan Saettel, Nicolas Mouawad, Liliane J Cheminform Research Article BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking. Then, a relationship between these program performances and the properties of the targets or the small molecules was investigated. RESULTS: The comparison was based on two metrics, with three different parameters each. The BEDROC scores with α = 80.5, indicated that, on the overall database, Glide succeeded (score > 0.5) for 30 targets, Gold for 27, FlexX for 14 and Surflex for 11. The performance did not depend on the hydrophobicity nor the openness of the protein cavities, neither on the families to which the proteins belong. However, despite the care in the construction of the DUD-E database, the small differences that remain between the actives and the decoys likely explain the successes of Gold, Surflex and FlexX. Moreover, the similarity between the actives of a target and its crystal structure ligand seems to be at the basis of the good performance of Glide. When all targets with significant biases are removed from the benchmarking, a subset of 47 targets remains, for which Glide succeeded for only 5 targets, Gold for 4 and FlexX and Surflex for 2. CONCLUSION: The performance dramatic drop of all four programs when the biases are removed shows that we should beware of virtual screening benchmarks, because good performances may be due to wrong reasons. Therefore, benchmarking would hardly provide guidelines for virtual screening experiments, despite the tendency that is maintained, i.e., Glide and Gold display better performance than FlexX and Surflex. We recommend to always use several programs and combine their results. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-016-0167-x) contains supplementary material, which is available to authorized users. Springer International Publishing 2016-10-17 /pmc/articles/PMC5066283/ /pubmed/27803745 http://dx.doi.org/10.1186/s13321-016-0167-x Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Chaput, Ludovic Martinez-Sanz, Juan Saettel, Nicolas Mouawad, Liliane Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
title | Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
title_full | Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
title_fullStr | Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
title_full_unstemmed | Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
title_short | Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
title_sort | benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5066283/ https://www.ncbi.nlm.nih.gov/pubmed/27803745 http://dx.doi.org/10.1186/s13321-016-0167-x |
work_keys_str_mv | AT chaputludovic benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance AT martinezsanzjuan benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance AT saettelnicolas benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance AT mouawadliliane benchmarkoffourpopularvirtualscreeningprogramsconstructionoftheactivedecoydatasetremainsamajordeterminantofmeasuredperformance |