Cargando…

A statistical framework to evaluate virtual screening

BACKGROUND: Receiver operating characteristic (ROC) curve is widely used to evaluate virtual screening (VS) studies. However, the method fails to address the "early recognition" problem specific to VS. Although many other metrics, such as RIE, BEDROC, and pROC that emphasize "early re...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Wei, Hevener, Kirk E, White, Stephen W, Lee, Richard E, Boyett, James M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2722655/ https://www.ncbi.nlm.nih.gov/pubmed/19619306 http://dx.doi.org/10.1186/1471-2105-10-225

_version_	1782170322113396736
author	Zhao, Wei Hevener, Kirk E White, Stephen W Lee, Richard E Boyett, James M
author_facet	Zhao, Wei Hevener, Kirk E White, Stephen W Lee, Richard E Boyett, James M
author_sort	Zhao, Wei
collection	PubMed
description	BACKGROUND: Receiver operating characteristic (ROC) curve is widely used to evaluate virtual screening (VS) studies. However, the method fails to address the "early recognition" problem specific to VS. Although many other metrics, such as RIE, BEDROC, and pROC that emphasize "early recognition" have been proposed, there are no rigorous statistical guidelines for determining the thresholds and performing significance tests. Also no comparisons have been made between these metrics under a statistical framework to better understand their performances. RESULTS: We have proposed a statistical framework to evaluate VS studies by which the threshold to determine whether a ranking method is better than random ranking can be derived by bootstrap simulations and 2 ranking methods can be compared by permutation test. We found that different metrics emphasize "early recognition" differently. BEDROC and RIE are 2 statistically equivalent metrics. Our newly proposed metric SLR is superior to pROC. Through extensive simulations, we observed a "seesaw effect" – overemphasizing early recognition reduces the statistical power of a metric to detect true early recognitions. CONCLUSION: The statistical framework developed and tested by us is applicable to any other metric as well, even if their exact distribution is unknown. Under this framework, a threshold can be easily selected according to a pre-specified type I error rate and statistical comparisons between 2 ranking methods becomes possible. The theoretical null distribution of SLR metric is available so that the threshold of SLR can be exactly determined without resorting to bootstrap simulations, which makes it easy to use in practical virtual screening studies.
format	Text
id	pubmed-2722655
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27226552009-08-07 A statistical framework to evaluate virtual screening Zhao, Wei Hevener, Kirk E White, Stephen W Lee, Richard E Boyett, James M BMC Bioinformatics Methodology Article BACKGROUND: Receiver operating characteristic (ROC) curve is widely used to evaluate virtual screening (VS) studies. However, the method fails to address the "early recognition" problem specific to VS. Although many other metrics, such as RIE, BEDROC, and pROC that emphasize "early recognition" have been proposed, there are no rigorous statistical guidelines for determining the thresholds and performing significance tests. Also no comparisons have been made between these metrics under a statistical framework to better understand their performances. RESULTS: We have proposed a statistical framework to evaluate VS studies by which the threshold to determine whether a ranking method is better than random ranking can be derived by bootstrap simulations and 2 ranking methods can be compared by permutation test. We found that different metrics emphasize "early recognition" differently. BEDROC and RIE are 2 statistically equivalent metrics. Our newly proposed metric SLR is superior to pROC. Through extensive simulations, we observed a "seesaw effect" – overemphasizing early recognition reduces the statistical power of a metric to detect true early recognitions. CONCLUSION: The statistical framework developed and tested by us is applicable to any other metric as well, even if their exact distribution is unknown. Under this framework, a threshold can be easily selected according to a pre-specified type I error rate and statistical comparisons between 2 ranking methods becomes possible. The theoretical null distribution of SLR metric is available so that the threshold of SLR can be exactly determined without resorting to bootstrap simulations, which makes it easy to use in practical virtual screening studies. BioMed Central 2009-07-20 /pmc/articles/PMC2722655/ /pubmed/19619306 http://dx.doi.org/10.1186/1471-2105-10-225 Text en Copyright © 2009 Zhao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Zhao, Wei Hevener, Kirk E White, Stephen W Lee, Richard E Boyett, James M A statistical framework to evaluate virtual screening
title	A statistical framework to evaluate virtual screening
title_full	A statistical framework to evaluate virtual screening
title_fullStr	A statistical framework to evaluate virtual screening
title_full_unstemmed	A statistical framework to evaluate virtual screening
title_short	A statistical framework to evaluate virtual screening
title_sort	statistical framework to evaluate virtual screening
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2722655/ https://www.ncbi.nlm.nih.gov/pubmed/19619306 http://dx.doi.org/10.1186/1471-2105-10-225
work_keys_str_mv	AT zhaowei astatisticalframeworktoevaluatevirtualscreening AT hevenerkirke astatisticalframeworktoevaluatevirtualscreening AT whitestephenw astatisticalframeworktoevaluatevirtualscreening AT leericharde astatisticalframeworktoevaluatevirtualscreening AT boyettjamesm astatisticalframeworktoevaluatevirtualscreening AT zhaowei statisticalframeworktoevaluatevirtualscreening AT hevenerkirke statisticalframeworktoevaluatevirtualscreening AT whitestephenw statisticalframeworktoevaluatevirtualscreening AT leericharde statisticalframeworktoevaluatevirtualscreening AT boyettjamesm statisticalframeworktoevaluatevirtualscreening

A statistical framework to evaluate virtual screening

Ejemplares similares