Cargando…

Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms

In the era of a large number of tools and applications that constantly produce massive amounts of data, their processing and proper classification is becoming both increasingly hard and important. This task is hindered by changing the distribution of data over time, called the concept drift, and the...

Descripción completa

Detalles Bibliográficos
Autores principales: Wegier, Weronika, Ksieniewicz, Pawel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517449/
https://www.ncbi.nlm.nih.gov/pubmed/33286620
http://dx.doi.org/10.3390/e22080849
_version_ 1783587227942518784
author Wegier, Weronika
Ksieniewicz, Pawel
author_facet Wegier, Weronika
Ksieniewicz, Pawel
author_sort Wegier, Weronika
collection PubMed
description In the era of a large number of tools and applications that constantly produce massive amounts of data, their processing and proper classification is becoming both increasingly hard and important. This task is hindered by changing the distribution of data over time, called the concept drift, and the emergence of a problem of disproportion between classes—such as in the detection of network attacks or fraud detection problems. In the following work, we propose methods to modify existing stream processing solutions—Accuracy Weighted Ensemble (AWE) and Accuracy Updated Ensemble (AUE), which have demonstrated their effectiveness in adapting to time-varying class distribution. The introduced changes are aimed at increasing their quality on binary classification of imbalanced data. The proposed modifications contain the inclusion of aggregate metrics, such as F1-score, G-mean and balanced accuracy score in calculation of the member classifiers weights, which affects their composition and final prediction. Moreover, the impact of data sampling on the algorithm’s effectiveness was also checked. Complex experiments were conducted to define the most promising modification type, as well as to compare proposed methods with existing solutions. Experimental evaluation shows an improvement in the quality of classification compared to the underlying algorithms and other solutions for processing imbalanced data streams.
format Online
Article
Text
id pubmed-7517449
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75174492020-11-09 Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms Wegier, Weronika Ksieniewicz, Pawel Entropy (Basel) Article In the era of a large number of tools and applications that constantly produce massive amounts of data, their processing and proper classification is becoming both increasingly hard and important. This task is hindered by changing the distribution of data over time, called the concept drift, and the emergence of a problem of disproportion between classes—such as in the detection of network attacks or fraud detection problems. In the following work, we propose methods to modify existing stream processing solutions—Accuracy Weighted Ensemble (AWE) and Accuracy Updated Ensemble (AUE), which have demonstrated their effectiveness in adapting to time-varying class distribution. The introduced changes are aimed at increasing their quality on binary classification of imbalanced data. The proposed modifications contain the inclusion of aggregate metrics, such as F1-score, G-mean and balanced accuracy score in calculation of the member classifiers weights, which affects their composition and final prediction. Moreover, the impact of data sampling on the algorithm’s effectiveness was also checked. Complex experiments were conducted to define the most promising modification type, as well as to compare proposed methods with existing solutions. Experimental evaluation shows an improvement in the quality of classification compared to the underlying algorithms and other solutions for processing imbalanced data streams. MDPI 2020-07-31 /pmc/articles/PMC7517449/ /pubmed/33286620 http://dx.doi.org/10.3390/e22080849 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wegier, Weronika
Ksieniewicz, Pawel
Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
title Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
title_full Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
title_fullStr Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
title_full_unstemmed Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
title_short Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
title_sort application of imbalanced data classification quality metrics as weighting methods of the ensemble data stream classification algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517449/
https://www.ncbi.nlm.nih.gov/pubmed/33286620
http://dx.doi.org/10.3390/e22080849
work_keys_str_mv AT wegierweronika applicationofimbalanceddataclassificationqualitymetricsasweightingmethodsoftheensembledatastreamclassificationalgorithms
AT ksieniewiczpawel applicationofimbalanceddataclassificationqualitymetricsasweightingmethodsoftheensembledatastreamclassificationalgorithms