Cargando…
Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the co...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303687/ http://dx.doi.org/10.1007/978-3-030-50423-6_11 |
_version_ | 1783548113286332416 |
---|---|
author | Żak, Michał Woźniak, Michał |
author_facet | Żak, Michał Woźniak, Michał |
author_sort | Żak, Michał |
collection | PubMed |
description | Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the concepts. All of these factors contribute to the complexity of the task and increase the difficulty of building a quality data model by learning algorithms. One of the ways of addressing these challenges are so-called binarization strategies, which allow for decomposition of the multi-class problem into several binary tasks with lower complexity. Because of the different decomposition schemes used by each of those methods, some of them are considered to be better suited for handling imbalanced data than the others. In this study, we focus on the well-known binary approaches, namely One-Vs-All, One-Vs-One, and Error-Correcting Output Codes, and their effectiveness in multi-class imbalanced data classification, with respect to the base classifiers and various aggregation schemes for each of the strategies. We compare the performance of these approaches and try to boost the performance of seemingly weaker methods by sampling algorithms. The detailed comparative experimental study of the considered methods, supported by the statistical analysis, is presented. The results show the differences among various binarization strategies. We show how one can mitigate those differences using simple oversampling methods. |
format | Online Article Text |
id | pubmed-7303687 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73036872020-06-19 Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification Żak, Michał Woźniak, Michał Computational Science – ICCS 2020 Article Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the concepts. All of these factors contribute to the complexity of the task and increase the difficulty of building a quality data model by learning algorithms. One of the ways of addressing these challenges are so-called binarization strategies, which allow for decomposition of the multi-class problem into several binary tasks with lower complexity. Because of the different decomposition schemes used by each of those methods, some of them are considered to be better suited for handling imbalanced data than the others. In this study, we focus on the well-known binary approaches, namely One-Vs-All, One-Vs-One, and Error-Correcting Output Codes, and their effectiveness in multi-class imbalanced data classification, with respect to the base classifiers and various aggregation schemes for each of the strategies. We compare the performance of these approaches and try to boost the performance of seemingly weaker methods by sampling algorithms. The detailed comparative experimental study of the considered methods, supported by the statistical analysis, is presented. The results show the differences among various binarization strategies. We show how one can mitigate those differences using simple oversampling methods. 2020-05-23 /pmc/articles/PMC7303687/ http://dx.doi.org/10.1007/978-3-030-50423-6_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Żak, Michał Woźniak, Michał Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification |
title | Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification |
title_full | Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification |
title_fullStr | Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification |
title_full_unstemmed | Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification |
title_short | Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification |
title_sort | performance analysis of binarization strategies for multi-class imbalanced data classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303687/ http://dx.doi.org/10.1007/978-3-030-50423-6_11 |
work_keys_str_mv | AT zakmichał performanceanalysisofbinarizationstrategiesformulticlassimbalanceddataclassification AT wozniakmichał performanceanalysisofbinarizationstrategiesformulticlassimbalanceddataclassification |