Cargando…

Scientific knowledge is possible with small-sample classification

A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dougherty, Edward R, Dalton, Lori A
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765562/ https://www.ncbi.nlm.nih.gov/pubmed/23958425 http://dx.doi.org/10.1186/1687-4153-2013-10

_version_	1782283338949591040
author	Dougherty, Edward R Dalton, Lori A
author_facet	Dougherty, Edward R Dalton, Lori A
author_sort	Dougherty, Edward R
collection	PubMed
description	A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims. Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate.
format	Online Article Text
id	pubmed-3765562
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37655622013-09-11 Scientific knowledge is possible with small-sample classification Dougherty, Edward R Dalton, Lori A EURASIP J Bioinform Syst Biol Review A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims. Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate. BioMed Central 2013 2013-08-20 /pmc/articles/PMC3765562/ /pubmed/23958425 http://dx.doi.org/10.1186/1687-4153-2013-10 Text en Copyright © 2013 Dougherty and Dalton; licensee Springer. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Review Dougherty, Edward R Dalton, Lori A Scientific knowledge is possible with small-sample classification
title	Scientific knowledge is possible with small-sample classification
title_full	Scientific knowledge is possible with small-sample classification
title_fullStr	Scientific knowledge is possible with small-sample classification
title_full_unstemmed	Scientific knowledge is possible with small-sample classification
title_short	Scientific knowledge is possible with small-sample classification
title_sort	scientific knowledge is possible with small-sample classification
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765562/ https://www.ncbi.nlm.nih.gov/pubmed/23958425 http://dx.doi.org/10.1186/1687-4153-2013-10
work_keys_str_mv	AT doughertyedwardr scientificknowledgeispossiblewithsmallsampleclassification AT daltonloria scientificknowledgeispossiblewithsmallsampleclassification

Scientific knowledge is possible with small-sample classification

Ejemplares similares