Cargando…

Optimality Driven Nearest Centroid Classification from Genomic Data

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each...

Descripción completa

Detalles Bibliográficos
Autores principales: Dabney, Alan R., Storey, John D.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1991588/
https://www.ncbi.nlm.nih.gov/pubmed/17912341
http://dx.doi.org/10.1371/journal.pone.0001002
_version_ 1782135441809473536
author Dabney, Alan R.
Storey, John D.
author_facet Dabney, Alan R.
Storey, John D.
author_sort Dabney, Alan R.
collection PubMed
description Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.
format Text
id pubmed-1991588
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-19915882007-10-05 Optimality Driven Nearest Centroid Classification from Genomic Data Dabney, Alan R. Storey, John D. PLoS One Research Article Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers. Public Library of Science 2007-10-03 /pmc/articles/PMC1991588/ /pubmed/17912341 http://dx.doi.org/10.1371/journal.pone.0001002 Text en Dabney, Storey. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dabney, Alan R.
Storey, John D.
Optimality Driven Nearest Centroid Classification from Genomic Data
title Optimality Driven Nearest Centroid Classification from Genomic Data
title_full Optimality Driven Nearest Centroid Classification from Genomic Data
title_fullStr Optimality Driven Nearest Centroid Classification from Genomic Data
title_full_unstemmed Optimality Driven Nearest Centroid Classification from Genomic Data
title_short Optimality Driven Nearest Centroid Classification from Genomic Data
title_sort optimality driven nearest centroid classification from genomic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1991588/
https://www.ncbi.nlm.nih.gov/pubmed/17912341
http://dx.doi.org/10.1371/journal.pone.0001002
work_keys_str_mv AT dabneyalanr optimalitydrivennearestcentroidclassificationfromgenomicdata
AT storeyjohnd optimalitydrivennearestcentroidclassificationfromgenomicdata