Cargando…

A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data

Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples...

Descripción completa

Detalles Bibliográficos
Autores principales: Collell, Guillem, Prelec, Drazen, Patil, Kaustubh R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Science Publishers 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5750819/
https://www.ncbi.nlm.nih.gov/pubmed/29398782
http://dx.doi.org/10.1016/j.neucom.2017.08.035
_version_ 1783289810002116608
author Collell, Guillem
Prelec, Drazen
Patil, Kaustubh R.
author_facet Collell, Guillem
Prelec, Drazen
Patil, Kaustubh R.
author_sort Collell, Guillem
collection PubMed
description Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori, i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.
format Online
Article
Text
id pubmed-5750819
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Elsevier Science Publishers
record_format MEDLINE/PubMed
spelling pubmed-57508192018-01-31 A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data Collell, Guillem Prelec, Drazen Patil, Kaustubh R. Neurocomputing Article Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori, i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method. Elsevier Science Publishers 2018-01-31 /pmc/articles/PMC5750819/ /pubmed/29398782 http://dx.doi.org/10.1016/j.neucom.2017.08.035 Text en © 2017 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Collell, Guillem
Prelec, Drazen
Patil, Kaustubh R.
A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
title A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
title_full A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
title_fullStr A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
title_full_unstemmed A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
title_short A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
title_sort simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5750819/
https://www.ncbi.nlm.nih.gov/pubmed/29398782
http://dx.doi.org/10.1016/j.neucom.2017.08.035
work_keys_str_mv AT collellguillem asimplepluginbaggingensemblebasedonthresholdmovingforclassifyingbinaryandmulticlassimbalanceddata
AT prelecdrazen asimplepluginbaggingensemblebasedonthresholdmovingforclassifyingbinaryandmulticlassimbalanceddata
AT patilkaustubhr asimplepluginbaggingensemblebasedonthresholdmovingforclassifyingbinaryandmulticlassimbalanceddata
AT collellguillem simplepluginbaggingensemblebasedonthresholdmovingforclassifyingbinaryandmulticlassimbalanceddata
AT prelecdrazen simplepluginbaggingensemblebasedonthresholdmovingforclassifyingbinaryandmulticlassimbalanceddata
AT patilkaustubhr simplepluginbaggingensemblebasedonthresholdmovingforclassifyingbinaryandmulticlassimbalanceddata