Cargando…
Optimality Driven Nearest Centroid Classification from Genomic Data
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1991588/ https://www.ncbi.nlm.nih.gov/pubmed/17912341 http://dx.doi.org/10.1371/journal.pone.0001002 |
_version_ | 1782135441809473536 |
---|---|
author | Dabney, Alan R. Storey, John D. |
author_facet | Dabney, Alan R. Storey, John D. |
author_sort | Dabney, Alan R. |
collection | PubMed |
description | Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers. |
format | Text |
id | pubmed-1991588 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-19915882007-10-05 Optimality Driven Nearest Centroid Classification from Genomic Data Dabney, Alan R. Storey, John D. PLoS One Research Article Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers. Public Library of Science 2007-10-03 /pmc/articles/PMC1991588/ /pubmed/17912341 http://dx.doi.org/10.1371/journal.pone.0001002 Text en Dabney, Storey. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Dabney, Alan R. Storey, John D. Optimality Driven Nearest Centroid Classification from Genomic Data |
title | Optimality Driven Nearest Centroid Classification from Genomic Data |
title_full | Optimality Driven Nearest Centroid Classification from Genomic Data |
title_fullStr | Optimality Driven Nearest Centroid Classification from Genomic Data |
title_full_unstemmed | Optimality Driven Nearest Centroid Classification from Genomic Data |
title_short | Optimality Driven Nearest Centroid Classification from Genomic Data |
title_sort | optimality driven nearest centroid classification from genomic data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1991588/ https://www.ncbi.nlm.nih.gov/pubmed/17912341 http://dx.doi.org/10.1371/journal.pone.0001002 |
work_keys_str_mv | AT dabneyalanr optimalitydrivennearestcentroidclassificationfromgenomicdata AT storeyjohnd optimalitydrivennearestcentroidclassificationfromgenomicdata |