Cargando…

Which Is Better: Holdout or Full-Sample Classifier Design?

Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error esti...

Descripción completa

Detalles Bibliográficos
Autores principales: Brun, Marcel, Xu, Qian, Dougherty, Edward R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171393/
https://www.ncbi.nlm.nih.gov/pubmed/18483613
http://dx.doi.org/10.1155/2008/297945
_version_ 1782211752508784640
author Brun, Marcel
Xu, Qian
Dougherty, Edward R
author_facet Brun, Marcel
Xu, Qian
Dougherty, Edward R
author_sort Brun, Marcel
collection PubMed
description Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error.
format Online
Article
Text
id pubmed-3171393
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Springer
record_format MEDLINE/PubMed
spelling pubmed-31713932011-09-13 Which Is Better: Holdout or Full-Sample Classifier Design? Brun, Marcel Xu, Qian Dougherty, Edward R EURASIP J Bioinform Syst Biol Research Article Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error. Springer 2007-12-12 /pmc/articles/PMC3171393/ /pubmed/18483613 http://dx.doi.org/10.1155/2008/297945 Text en Copyright © 2008 Marcel Brun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Brun, Marcel
Xu, Qian
Dougherty, Edward R
Which Is Better: Holdout or Full-Sample Classifier Design?
title Which Is Better: Holdout or Full-Sample Classifier Design?
title_full Which Is Better: Holdout or Full-Sample Classifier Design?
title_fullStr Which Is Better: Holdout or Full-Sample Classifier Design?
title_full_unstemmed Which Is Better: Holdout or Full-Sample Classifier Design?
title_short Which Is Better: Holdout or Full-Sample Classifier Design?
title_sort which is better: holdout or full-sample classifier design?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171393/
https://www.ncbi.nlm.nih.gov/pubmed/18483613
http://dx.doi.org/10.1155/2008/297945
work_keys_str_mv AT brunmarcel whichisbetterholdoutorfullsampleclassifierdesign
AT xuqian whichisbetterholdoutorfullsampleclassifierdesign
AT doughertyedwardr whichisbetterholdoutorfullsampleclassifierdesign