Cargando…

An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream

Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular overs...

Descripción completa

Detalles Bibliográficos
Autores principales: Fatlawi, Hayder K., Kiss, Attila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9963940/
https://www.ncbi.nlm.nih.gov/pubmed/36850659
http://dx.doi.org/10.3390/s23042061
_version_ 1784896379507703808
author Fatlawi, Hayder K.
Kiss, Attila
author_facet Fatlawi, Hayder K.
Kiss, Attila
author_sort Fatlawi, Hayder K.
collection PubMed
description Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular oversampling techniques have been designed for imbalanced batch data rather than a continuous stream. This work proposes a self-adjusting window to improve the adaptive classification of an imbalanced data stream based on minimizing cluster distortion. It includes two models; the first chooses only the previous data instances that preserve the coherence of the current chunk’s samples. The second model relaxes the strict filter by excluding the examples of the last chunk. Both models include generating synthetic points for oversampling rather than the actual data points. The evaluation of the proposed models using the Siena EEG dataset showed their ability to improve the performance of several adaptive classifiers. The best results have been obtained using Adaptive Random Forest in which Sensitivity reached 96.83% and Precision reached 99.96%.
format Online
Article
Text
id pubmed-9963940
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99639402023-02-26 An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream Fatlawi, Hayder K. Kiss, Attila Sensors (Basel) Article Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular oversampling techniques have been designed for imbalanced batch data rather than a continuous stream. This work proposes a self-adjusting window to improve the adaptive classification of an imbalanced data stream based on minimizing cluster distortion. It includes two models; the first chooses only the previous data instances that preserve the coherence of the current chunk’s samples. The second model relaxes the strict filter by excluding the examples of the last chunk. Both models include generating synthetic points for oversampling rather than the actual data points. The evaluation of the proposed models using the Siena EEG dataset showed their ability to improve the performance of several adaptive classifiers. The best results have been obtained using Adaptive Random Forest in which Sensitivity reached 96.83% and Precision reached 99.96%. MDPI 2023-02-11 /pmc/articles/PMC9963940/ /pubmed/36850659 http://dx.doi.org/10.3390/s23042061 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fatlawi, Hayder K.
Kiss, Attila
An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
title An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
title_full An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
title_fullStr An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
title_full_unstemmed An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
title_short An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
title_sort elastic self-adjusting technique for rare-class synthetic oversampling based on cluster distortion minimization in data stream
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9963940/
https://www.ncbi.nlm.nih.gov/pubmed/36850659
http://dx.doi.org/10.3390/s23042061
work_keys_str_mv AT fatlawihayderk anelasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream
AT kissattila anelasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream
AT fatlawihayderk elasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream
AT kissattila elasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream