Cargando…

Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification

Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the co...

Descripción completa

Detalles Bibliográficos
Autores principales: Żak, Michał, Woźniak, Michał
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303687/
http://dx.doi.org/10.1007/978-3-030-50423-6_11
_version_ 1783548113286332416
author Żak, Michał
Woźniak, Michał
author_facet Żak, Michał
Woźniak, Michał
author_sort Żak, Michał
collection PubMed
description Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the concepts. All of these factors contribute to the complexity of the task and increase the difficulty of building a quality data model by learning algorithms. One of the ways of addressing these challenges are so-called binarization strategies, which allow for decomposition of the multi-class problem into several binary tasks with lower complexity. Because of the different decomposition schemes used by each of those methods, some of them are considered to be better suited for handling imbalanced data than the others. In this study, we focus on the well-known binary approaches, namely One-Vs-All, One-Vs-One, and Error-Correcting Output Codes, and their effectiveness in multi-class imbalanced data classification, with respect to the base classifiers and various aggregation schemes for each of the strategies. We compare the performance of these approaches and try to boost the performance of seemingly weaker methods by sampling algorithms. The detailed comparative experimental study of the considered methods, supported by the statistical analysis, is presented. The results show the differences among various binarization strategies. We show how one can mitigate those differences using simple oversampling methods.
format Online
Article
Text
id pubmed-7303687
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73036872020-06-19 Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification Żak, Michał Woźniak, Michał Computational Science – ICCS 2020 Article Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the concepts. All of these factors contribute to the complexity of the task and increase the difficulty of building a quality data model by learning algorithms. One of the ways of addressing these challenges are so-called binarization strategies, which allow for decomposition of the multi-class problem into several binary tasks with lower complexity. Because of the different decomposition schemes used by each of those methods, some of them are considered to be better suited for handling imbalanced data than the others. In this study, we focus on the well-known binary approaches, namely One-Vs-All, One-Vs-One, and Error-Correcting Output Codes, and their effectiveness in multi-class imbalanced data classification, with respect to the base classifiers and various aggregation schemes for each of the strategies. We compare the performance of these approaches and try to boost the performance of seemingly weaker methods by sampling algorithms. The detailed comparative experimental study of the considered methods, supported by the statistical analysis, is presented. The results show the differences among various binarization strategies. We show how one can mitigate those differences using simple oversampling methods. 2020-05-23 /pmc/articles/PMC7303687/ http://dx.doi.org/10.1007/978-3-030-50423-6_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Żak, Michał
Woźniak, Michał
Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
title Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
title_full Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
title_fullStr Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
title_full_unstemmed Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
title_short Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
title_sort performance analysis of binarization strategies for multi-class imbalanced data classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303687/
http://dx.doi.org/10.1007/978-3-030-50423-6_11
work_keys_str_mv AT zakmichał performanceanalysisofbinarizationstrategiesformulticlassimbalanceddataclassification
AT wozniakmichał performanceanalysisofbinarizationstrategiesformulticlassimbalanceddataclassification