Cargando…

On the assessment of the added value of new predictive biomarkers

BACKGROUND: The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing one...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Weijie, Samuelson, Frank W, Gallas, Brandon D, Kang, Le, Sahiner, Berkman, Petrick, Nicholas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Debate
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3733611/ https://www.ncbi.nlm.nih.gov/pubmed/23895587 http://dx.doi.org/10.1186/1471-2288-13-98

_version_	1782279370685022208
author	Chen, Weijie Samuelson, Frank W Gallas, Brandon D Kang, Le Sahiner, Berkman Petrick, Nicholas
author_facet	Chen, Weijie Samuelson, Frank W Gallas, Brandon D Kang, Le Sahiner, Berkman Petrick, Nicholas
author_sort	Chen, Weijie
collection	PubMed
description	BACKGROUND: The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC “has vastly inferior statistical properties,” i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests. DISCUSSION: We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper. SUMMARY: We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.
format	Online Article Text
id	pubmed-3733611
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37336112013-08-06 On the assessment of the added value of new predictive biomarkers Chen, Weijie Samuelson, Frank W Gallas, Brandon D Kang, Le Sahiner, Berkman Petrick, Nicholas BMC Med Res Methodol Debate BACKGROUND: The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC “has vastly inferior statistical properties,” i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests. DISCUSSION: We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper. SUMMARY: We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies. BioMed Central 2013-07-29 /pmc/articles/PMC3733611/ /pubmed/23895587 http://dx.doi.org/10.1186/1471-2288-13-98 Text en Copyright © 2013 Chen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Debate Chen, Weijie Samuelson, Frank W Gallas, Brandon D Kang, Le Sahiner, Berkman Petrick, Nicholas On the assessment of the added value of new predictive biomarkers
title	On the assessment of the added value of new predictive biomarkers
title_full	On the assessment of the added value of new predictive biomarkers
title_fullStr	On the assessment of the added value of new predictive biomarkers
title_full_unstemmed	On the assessment of the added value of new predictive biomarkers
title_short	On the assessment of the added value of new predictive biomarkers
title_sort	on the assessment of the added value of new predictive biomarkers
topic	Debate
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3733611/ https://www.ncbi.nlm.nih.gov/pubmed/23895587 http://dx.doi.org/10.1186/1471-2288-13-98
work_keys_str_mv	AT chenweijie ontheassessmentoftheaddedvalueofnewpredictivebiomarkers AT samuelsonfrankw ontheassessmentoftheaddedvalueofnewpredictivebiomarkers AT gallasbrandond ontheassessmentoftheaddedvalueofnewpredictivebiomarkers AT kangle ontheassessmentoftheaddedvalueofnewpredictivebiomarkers AT sahinerberkman ontheassessmentoftheaddedvalueofnewpredictivebiomarkers AT petricknicholas ontheassessmentoftheaddedvalueofnewpredictivebiomarkers

On the assessment of the added value of new predictive biomarkers

Ejemplares similares