Cargando…

On the Number of Close-to-Optimal Feature Sets

The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential f...

Descripción completa

Detalles Bibliográficos
Autores principales: Dougherty, Edward R., Brun, Marcel
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675502/
https://www.ncbi.nlm.nih.gov/pubmed/19458767
_version_ 1782166702770880512
author Dougherty, Edward R.
Brun, Marcel
author_facet Dougherty, Edward R.
Brun, Marcel
author_sort Dougherty, Edward R.
collection PubMed
description The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets.
format Text
id pubmed-2675502
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-26755022009-05-20 On the Number of Close-to-Optimal Feature Sets Dougherty, Edward R. Brun, Marcel Cancer Inform Rapid Communication The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets. Libertas Academica 2007-02-16 /pmc/articles/PMC2675502/ /pubmed/19458767 Text en © 2006 The authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Rapid Communication
Dougherty, Edward R.
Brun, Marcel
On the Number of Close-to-Optimal Feature Sets
title On the Number of Close-to-Optimal Feature Sets
title_full On the Number of Close-to-Optimal Feature Sets
title_fullStr On the Number of Close-to-Optimal Feature Sets
title_full_unstemmed On the Number of Close-to-Optimal Feature Sets
title_short On the Number of Close-to-Optimal Feature Sets
title_sort on the number of close-to-optimal feature sets
topic Rapid Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675502/
https://www.ncbi.nlm.nih.gov/pubmed/19458767
work_keys_str_mv AT doughertyedwardr onthenumberofclosetooptimalfeaturesets
AT brunmarcel onthenumberofclosetooptimalfeaturesets