Cargando…
On the Number of Close-to-Optimal Feature Sets
The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential f...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675502/ https://www.ncbi.nlm.nih.gov/pubmed/19458767 |
_version_ | 1782166702770880512 |
---|---|
author | Dougherty, Edward R. Brun, Marcel |
author_facet | Dougherty, Edward R. Brun, Marcel |
author_sort | Dougherty, Edward R. |
collection | PubMed |
description | The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets. |
format | Text |
id | pubmed-2675502 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-26755022009-05-20 On the Number of Close-to-Optimal Feature Sets Dougherty, Edward R. Brun, Marcel Cancer Inform Rapid Communication The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets. Libertas Academica 2007-02-16 /pmc/articles/PMC2675502/ /pubmed/19458767 Text en © 2006 The authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/). |
spellingShingle | Rapid Communication Dougherty, Edward R. Brun, Marcel On the Number of Close-to-Optimal Feature Sets |
title | On the Number of Close-to-Optimal Feature Sets |
title_full | On the Number of Close-to-Optimal Feature Sets |
title_fullStr | On the Number of Close-to-Optimal Feature Sets |
title_full_unstemmed | On the Number of Close-to-Optimal Feature Sets |
title_short | On the Number of Close-to-Optimal Feature Sets |
title_sort | on the number of close-to-optimal feature sets |
topic | Rapid Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675502/ https://www.ncbi.nlm.nih.gov/pubmed/19458767 |
work_keys_str_mv | AT doughertyedwardr onthenumberofclosetooptimalfeaturesets AT brunmarcel onthenumberofclosetooptimalfeaturesets |