Cargando…
Which Is Better: Holdout or Full-Sample Classifier Design?
Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error esti...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171393/ https://www.ncbi.nlm.nih.gov/pubmed/18483613 http://dx.doi.org/10.1155/2008/297945 |
_version_ | 1782211752508784640 |
---|---|
author | Brun, Marcel Xu, Qian Dougherty, Edward R |
author_facet | Brun, Marcel Xu, Qian Dougherty, Edward R |
author_sort | Brun, Marcel |
collection | PubMed |
description | Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error. |
format | Online Article Text |
id | pubmed-3171393 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Springer |
record_format | MEDLINE/PubMed |
spelling | pubmed-31713932011-09-13 Which Is Better: Holdout or Full-Sample Classifier Design? Brun, Marcel Xu, Qian Dougherty, Edward R EURASIP J Bioinform Syst Biol Research Article Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error. Springer 2007-12-12 /pmc/articles/PMC3171393/ /pubmed/18483613 http://dx.doi.org/10.1155/2008/297945 Text en Copyright © 2008 Marcel Brun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Brun, Marcel Xu, Qian Dougherty, Edward R Which Is Better: Holdout or Full-Sample Classifier Design? |
title | Which Is Better: Holdout or Full-Sample Classifier Design? |
title_full | Which Is Better: Holdout or Full-Sample Classifier Design? |
title_fullStr | Which Is Better: Holdout or Full-Sample Classifier Design? |
title_full_unstemmed | Which Is Better: Holdout or Full-Sample Classifier Design? |
title_short | Which Is Better: Holdout or Full-Sample Classifier Design? |
title_sort | which is better: holdout or full-sample classifier design? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171393/ https://www.ncbi.nlm.nih.gov/pubmed/18483613 http://dx.doi.org/10.1155/2008/297945 |
work_keys_str_mv | AT brunmarcel whichisbetterholdoutorfullsampleclassifierdesign AT xuqian whichisbetterholdoutorfullsampleclassifierdesign AT doughertyedwardr whichisbetterholdoutorfullsampleclassifierdesign |