Cargando…
Improved shrunken centroid classifiers for high-dimensional class-imbalanced data
BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage i...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687811/ https://www.ncbi.nlm.nih.gov/pubmed/23433084 http://dx.doi.org/10.1186/1471-2105-14-64 |
_version_ | 1782273989076320256 |
---|---|
author | Blagus, Rok Lusa, Lara |
author_facet | Blagus, Rok Lusa, Lara |
author_sort | Blagus, Rok |
collection | PubMed |
description | BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate. RESULTS: We show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means). CONCLUSIONS: The results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data. |
format | Online Article Text |
id | pubmed-3687811 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36878112013-06-26 Improved shrunken centroid classifiers for high-dimensional class-imbalanced data Blagus, Rok Lusa, Lara BMC Bioinformatics Research Article BACKGROUND: PAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate. RESULTS: We show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means). CONCLUSIONS: The results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data. BioMed Central 2013-02-23 /pmc/articles/PMC3687811/ /pubmed/23433084 http://dx.doi.org/10.1186/1471-2105-14-64 Text en Copyright © 2013 Blagus and Lusa; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Blagus, Rok Lusa, Lara Improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
title | Improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
title_full | Improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
title_fullStr | Improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
title_full_unstemmed | Improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
title_short | Improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
title_sort | improved shrunken centroid classifiers for high-dimensional class-imbalanced data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687811/ https://www.ncbi.nlm.nih.gov/pubmed/23433084 http://dx.doi.org/10.1186/1471-2105-14-64 |
work_keys_str_mv | AT blagusrok improvedshrunkencentroidclassifiersforhighdimensionalclassimbalanceddata AT lusalara improvedshrunkencentroidclassifiersforhighdimensionalclassimbalanceddata |