Cargando…

Estimating misclassification error: a closer look at cross-validation based methods

BACKGROUND: To estimate a classifier’s error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV) methods based on sampling without replacement. Monte Carlo (MC) simulation studies aimed at estimating the true...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ounpraseuth, Songthip, Lensing, Shelly Y, Spencer, Horace J, Kodell, Ralph L
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Short Report
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3556102/ https://www.ncbi.nlm.nih.gov/pubmed/23190936 http://dx.doi.org/10.1186/1756-0500-5-656

_version_	1782257145337610240
author	Ounpraseuth, Songthip Lensing, Shelly Y Spencer, Horace J Kodell, Ralph L
author_facet	Ounpraseuth, Songthip Lensing, Shelly Y Spencer, Horace J Kodell, Ralph L
author_sort	Ounpraseuth, Songthip
collection	PubMed
description	BACKGROUND: To estimate a classifier’s error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV) methods based on sampling without replacement. Monte Carlo (MC) simulation studies aimed at estimating the true misclassification error conditional on the training set are commonly used to compare CV methods. We conducted an MC simulation study to compare a new method of bootstrap CV (BCV) to k-fold CV for estimating clasification error. FINDINGS: For the low-dimensional conditions simulated, the modest positive bias of k-fold CV contrasted sharply with the substantial negative bias of the new BCV method. This behavior was corroborated using a real-world dataset of prognostic gene-expression profiles in breast cancer patients. Our simulation results demonstrate some extreme characteristics of variance and bias that can occur due to a fault in the design of CV exercises aimed at estimating the true conditional error of a classifier, and that appear not to have been fully appreciated in previous studies. Although CV is a sound practice for estimating a classifier’s generalization error, using CV to estimate the fixed misclassification error of a trained classifier conditional on the training set is problematic. While MC simulation of this estimation exercise can correctly represent the average bias of a classifier, it will overstate the between-run variance of the bias. CONCLUSIONS: We recommend k-fold CV over the new BCV method for estimating a classifier’s generalization error. The extreme negative bias of BCV is too high a price to pay for its reduced variance.
format	Online Article Text
id	pubmed-3556102
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35561022013-01-31 Estimating misclassification error: a closer look at cross-validation based methods Ounpraseuth, Songthip Lensing, Shelly Y Spencer, Horace J Kodell, Ralph L BMC Res Notes Short Report BACKGROUND: To estimate a classifier’s error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV) methods based on sampling without replacement. Monte Carlo (MC) simulation studies aimed at estimating the true misclassification error conditional on the training set are commonly used to compare CV methods. We conducted an MC simulation study to compare a new method of bootstrap CV (BCV) to k-fold CV for estimating clasification error. FINDINGS: For the low-dimensional conditions simulated, the modest positive bias of k-fold CV contrasted sharply with the substantial negative bias of the new BCV method. This behavior was corroborated using a real-world dataset of prognostic gene-expression profiles in breast cancer patients. Our simulation results demonstrate some extreme characteristics of variance and bias that can occur due to a fault in the design of CV exercises aimed at estimating the true conditional error of a classifier, and that appear not to have been fully appreciated in previous studies. Although CV is a sound practice for estimating a classifier’s generalization error, using CV to estimate the fixed misclassification error of a trained classifier conditional on the training set is problematic. While MC simulation of this estimation exercise can correctly represent the average bias of a classifier, it will overstate the between-run variance of the bias. CONCLUSIONS: We recommend k-fold CV over the new BCV method for estimating a classifier’s generalization error. The extreme negative bias of BCV is too high a price to pay for its reduced variance. BioMed Central 2012-11-28 /pmc/articles/PMC3556102/ /pubmed/23190936 http://dx.doi.org/10.1186/1756-0500-5-656 Text en Copyright ©2012 Ounpraseuth et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Short Report Ounpraseuth, Songthip Lensing, Shelly Y Spencer, Horace J Kodell, Ralph L Estimating misclassification error: a closer look at cross-validation based methods
title	Estimating misclassification error: a closer look at cross-validation based methods
title_full	Estimating misclassification error: a closer look at cross-validation based methods
title_fullStr	Estimating misclassification error: a closer look at cross-validation based methods
title_full_unstemmed	Estimating misclassification error: a closer look at cross-validation based methods
title_short	Estimating misclassification error: a closer look at cross-validation based methods
title_sort	estimating misclassification error: a closer look at cross-validation based methods
topic	Short Report
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3556102/ https://www.ncbi.nlm.nih.gov/pubmed/23190936 http://dx.doi.org/10.1186/1756-0500-5-656
work_keys_str_mv	AT ounpraseuthsongthip estimatingmisclassificationerroracloserlookatcrossvalidationbasedmethods AT lensingshellyy estimatingmisclassificationerroracloserlookatcrossvalidationbasedmethods AT spencerhoracej estimatingmisclassificationerroracloserlookatcrossvalidationbasedmethods AT kodellralphl estimatingmisclassificationerroracloserlookatcrossvalidationbasedmethods

Estimating misclassification error: a closer look at cross-validation based methods

Ejemplares similares