Cargando…

On the Number of Close-to-Optimal Feature Sets

The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dougherty, Edward R., Brun, Marcel
Formato:	Texto
Lenguaje:	English
Publicado:	Libertas Academica 2007
Materias:	Rapid Communication
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675502/ https://www.ncbi.nlm.nih.gov/pubmed/19458767

_version_	1782166702770880512
author	Dougherty, Edward R. Brun, Marcel
author_facet	Dougherty, Edward R. Brun, Marcel
author_sort	Dougherty, Edward R.
collection	PubMed
description	The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets.
format	Text
id	pubmed-2675502
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-26755022009-05-20 On the Number of Close-to-Optimal Feature Sets Dougherty, Edward R. Brun, Marcel Cancer Inform Rapid Communication The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets. Libertas Academica 2007-02-16 /pmc/articles/PMC2675502/ /pubmed/19458767 Text en © 2006 The authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle	Rapid Communication Dougherty, Edward R. Brun, Marcel On the Number of Close-to-Optimal Feature Sets
title	On the Number of Close-to-Optimal Feature Sets
title_full	On the Number of Close-to-Optimal Feature Sets
title_fullStr	On the Number of Close-to-Optimal Feature Sets
title_full_unstemmed	On the Number of Close-to-Optimal Feature Sets
title_short	On the Number of Close-to-Optimal Feature Sets
title_sort	on the number of close-to-optimal feature sets
topic	Rapid Communication
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675502/ https://www.ncbi.nlm.nih.gov/pubmed/19458767
work_keys_str_mv	AT doughertyedwardr onthenumberofclosetooptimalfeaturesets AT brunmarcel onthenumberofclosetooptimalfeaturesets

On the Number of Close-to-Optimal Feature Sets

Ejemplares similares