Cargando…

Improved shrunken centroid classifiers for high-dimensional class-imbalanced data

BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage i...

Descripción completa

Detalles Bibliográficos
Autores principales: Blagus, Rok, Lusa, Lara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687811/
https://www.ncbi.nlm.nih.gov/pubmed/23433084
http://dx.doi.org/10.1186/1471-2105-14-64
_version_ 1782273989076320256
author Blagus, Rok
Lusa, Lara
author_facet Blagus, Rok
Lusa, Lara
author_sort Blagus, Rok
collection PubMed
description BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate. RESULTS: We show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means). CONCLUSIONS: The results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data.
format Online
Article
Text
id pubmed-3687811
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36878112013-06-26 Improved shrunken centroid classifiers for high-dimensional class-imbalanced data Blagus, Rok Lusa, Lara BMC Bioinformatics Research Article BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate. RESULTS: We show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means). CONCLUSIONS: The results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data. BioMed Central 2013-02-23 /pmc/articles/PMC3687811/ /pubmed/23433084 http://dx.doi.org/10.1186/1471-2105-14-64 Text en Copyright © 2013 Blagus and Lusa; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Blagus, Rok
Lusa, Lara
Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
title Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
title_full Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
title_fullStr Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
title_full_unstemmed Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
title_short Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
title_sort improved shrunken centroid classifiers for high-dimensional class-imbalanced data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687811/
https://www.ncbi.nlm.nih.gov/pubmed/23433084
http://dx.doi.org/10.1186/1471-2105-14-64
work_keys_str_mv AT blagusrok improvedshrunkencentroidclassifiersforhighdimensionalclassimbalanceddata
AT lusalara improvedshrunkencentroidclassifiersforhighdimensionalclassimbalanceddata