Cargando…
Selective oversampling approach for strongly imbalanced data
Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most represent...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237317/ https://www.ncbi.nlm.nih.gov/pubmed/34239981 http://dx.doi.org/10.7717/peerj-cs.604 |
_version_ | 1783714706785042432 |
---|---|
author | Gnip, Peter Vokorokos, Liberios Drotár, Peter |
author_facet | Gnip, Peter Vokorokos, Liberios Drotár, Peter |
author_sort | Gnip, Peter |
collection | PubMed |
description | Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods. |
format | Online Article Text |
id | pubmed-8237317 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82373172021-07-07 Selective oversampling approach for strongly imbalanced data Gnip, Peter Vokorokos, Liberios Drotár, Peter PeerJ Comput Sci Data Mining and Machine Learning Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods. PeerJ Inc. 2021-06-18 /pmc/articles/PMC8237317/ /pubmed/34239981 http://dx.doi.org/10.7717/peerj-cs.604 Text en ©2021 Gnip et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Mining and Machine Learning Gnip, Peter Vokorokos, Liberios Drotár, Peter Selective oversampling approach for strongly imbalanced data |
title | Selective oversampling approach for strongly imbalanced data |
title_full | Selective oversampling approach for strongly imbalanced data |
title_fullStr | Selective oversampling approach for strongly imbalanced data |
title_full_unstemmed | Selective oversampling approach for strongly imbalanced data |
title_short | Selective oversampling approach for strongly imbalanced data |
title_sort | selective oversampling approach for strongly imbalanced data |
topic | Data Mining and Machine Learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237317/ https://www.ncbi.nlm.nih.gov/pubmed/34239981 http://dx.doi.org/10.7717/peerj-cs.604 |
work_keys_str_mv | AT gnippeter selectiveoversamplingapproachforstronglyimbalanceddata AT vokorokosliberios selectiveoversamplingapproachforstronglyimbalanceddata AT drotarpeter selectiveoversamplingapproachforstronglyimbalanceddata |