Cargando…
An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream
Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular overs...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9963940/ https://www.ncbi.nlm.nih.gov/pubmed/36850659 http://dx.doi.org/10.3390/s23042061 |
_version_ | 1784896379507703808 |
---|---|
author | Fatlawi, Hayder K. Kiss, Attila |
author_facet | Fatlawi, Hayder K. Kiss, Attila |
author_sort | Fatlawi, Hayder K. |
collection | PubMed |
description | Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular oversampling techniques have been designed for imbalanced batch data rather than a continuous stream. This work proposes a self-adjusting window to improve the adaptive classification of an imbalanced data stream based on minimizing cluster distortion. It includes two models; the first chooses only the previous data instances that preserve the coherence of the current chunk’s samples. The second model relaxes the strict filter by excluding the examples of the last chunk. Both models include generating synthetic points for oversampling rather than the actual data points. The evaluation of the proposed models using the Siena EEG dataset showed their ability to improve the performance of several adaptive classifiers. The best results have been obtained using Adaptive Random Forest in which Sensitivity reached 96.83% and Precision reached 99.96%. |
format | Online Article Text |
id | pubmed-9963940 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99639402023-02-26 An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream Fatlawi, Hayder K. Kiss, Attila Sensors (Basel) Article Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular oversampling techniques have been designed for imbalanced batch data rather than a continuous stream. This work proposes a self-adjusting window to improve the adaptive classification of an imbalanced data stream based on minimizing cluster distortion. It includes two models; the first chooses only the previous data instances that preserve the coherence of the current chunk’s samples. The second model relaxes the strict filter by excluding the examples of the last chunk. Both models include generating synthetic points for oversampling rather than the actual data points. The evaluation of the proposed models using the Siena EEG dataset showed their ability to improve the performance of several adaptive classifiers. The best results have been obtained using Adaptive Random Forest in which Sensitivity reached 96.83% and Precision reached 99.96%. MDPI 2023-02-11 /pmc/articles/PMC9963940/ /pubmed/36850659 http://dx.doi.org/10.3390/s23042061 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Fatlawi, Hayder K. Kiss, Attila An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream |
title | An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream |
title_full | An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream |
title_fullStr | An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream |
title_full_unstemmed | An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream |
title_short | An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream |
title_sort | elastic self-adjusting technique for rare-class synthetic oversampling based on cluster distortion minimization in data stream |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9963940/ https://www.ncbi.nlm.nih.gov/pubmed/36850659 http://dx.doi.org/10.3390/s23042061 |
work_keys_str_mv | AT fatlawihayderk anelasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream AT kissattila anelasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream AT fatlawihayderk elasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream AT kissattila elasticselfadjustingtechniqueforrareclasssyntheticoversamplingbasedonclusterdistortionminimizationindatastream |