Cargando…

Scientific knowledge is possible with small-sample classification

A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under...

Descripción completa

Detalles Bibliográficos
Autores principales: Dougherty, Edward R, Dalton, Lori A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765562/
https://www.ncbi.nlm.nih.gov/pubmed/23958425
http://dx.doi.org/10.1186/1687-4153-2013-10
_version_ 1782283338949591040
author Dougherty, Edward R
Dalton, Lori A
author_facet Dougherty, Edward R
Dalton, Lori A
author_sort Dougherty, Edward R
collection PubMed
description A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims. Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate.
format Online
Article
Text
id pubmed-3765562
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37655622013-09-11 Scientific knowledge is possible with small-sample classification Dougherty, Edward R Dalton, Lori A EURASIP J Bioinform Syst Biol Review A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims. Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate. BioMed Central 2013 2013-08-20 /pmc/articles/PMC3765562/ /pubmed/23958425 http://dx.doi.org/10.1186/1687-4153-2013-10 Text en Copyright © 2013 Dougherty and Dalton; licensee Springer. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Dougherty, Edward R
Dalton, Lori A
Scientific knowledge is possible with small-sample classification
title Scientific knowledge is possible with small-sample classification
title_full Scientific knowledge is possible with small-sample classification
title_fullStr Scientific knowledge is possible with small-sample classification
title_full_unstemmed Scientific knowledge is possible with small-sample classification
title_short Scientific knowledge is possible with small-sample classification
title_sort scientific knowledge is possible with small-sample classification
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765562/
https://www.ncbi.nlm.nih.gov/pubmed/23958425
http://dx.doi.org/10.1186/1687-4153-2013-10
work_keys_str_mv AT doughertyedwardr scientificknowledgeispossiblewithsmallsampleclassification
AT daltonloria scientificknowledgeispossiblewithsmallsampleclassification