Cargando…

Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification

BACKGROUND: An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests,...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jinyan, Fong, Simon, Sung, Yunsick, Cho, Kyungeun, Wong, Raymond, Wong, Kelvin K. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5131504/
https://www.ncbi.nlm.nih.gov/pubmed/27980678
http://dx.doi.org/10.1186/s13040-016-0117-1
_version_ 1782470910497783808
author Li, Jinyan
Fong, Simon
Sung, Yunsick
Cho, Kyungeun
Wong, Raymond
Wong, Kelvin K. L.
author_facet Li, Jinyan
Fong, Simon
Sung, Yunsick
Cho, Kyungeun
Wong, Raymond
Wong, Kelvin K. L.
author_sort Li, Jinyan
collection PubMed
description BACKGROUND: An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. RESULTS: In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. CONCLUSIONS: Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
format Online
Article
Text
id pubmed-5131504
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51315042016-12-15 Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification Li, Jinyan Fong, Simon Sung, Yunsick Cho, Kyungeun Wong, Raymond Wong, Kelvin K. L. BioData Min Research BACKGROUND: An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. RESULTS: In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. CONCLUSIONS: Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model. BioMed Central 2016-12-01 /pmc/articles/PMC5131504/ /pubmed/27980678 http://dx.doi.org/10.1186/s13040-016-0117-1 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Li, Jinyan
Fong, Simon
Sung, Yunsick
Cho, Kyungeun
Wong, Raymond
Wong, Kelvin K. L.
Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
title Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
title_full Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
title_fullStr Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
title_full_unstemmed Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
title_short Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
title_sort adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5131504/
https://www.ncbi.nlm.nih.gov/pubmed/27980678
http://dx.doi.org/10.1186/s13040-016-0117-1
work_keys_str_mv AT lijinyan adaptiveswarmclusterbaseddynamicmultiobjectivesyntheticminorityoversamplingtechniquealgorithmfortacklingbinaryimbalanceddatasetsinbiomedicaldataclassification
AT fongsimon adaptiveswarmclusterbaseddynamicmultiobjectivesyntheticminorityoversamplingtechniquealgorithmfortacklingbinaryimbalanceddatasetsinbiomedicaldataclassification
AT sungyunsick adaptiveswarmclusterbaseddynamicmultiobjectivesyntheticminorityoversamplingtechniquealgorithmfortacklingbinaryimbalanceddatasetsinbiomedicaldataclassification
AT chokyungeun adaptiveswarmclusterbaseddynamicmultiobjectivesyntheticminorityoversamplingtechniquealgorithmfortacklingbinaryimbalanceddatasetsinbiomedicaldataclassification
AT wongraymond adaptiveswarmclusterbaseddynamicmultiobjectivesyntheticminorityoversamplingtechniquealgorithmfortacklingbinaryimbalanceddatasetsinbiomedicaldataclassification
AT wongkelvinkl adaptiveswarmclusterbaseddynamicmultiobjectivesyntheticminorityoversamplingtechniquealgorithmfortacklingbinaryimbalanceddatasetsinbiomedicaldataclassification