Cargando…

Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data

Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Fengqi, Yu, Chuang, Yang, Nanhai, Xia, Feng, Li, Guangming, Kaveh-Yazdy, Fatemeh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3725769/
https://www.ncbi.nlm.nih.gov/pubmed/23935439
http://dx.doi.org/10.1155/2013/875450
_version_ 1782278581349515264
author Li, Fengqi
Yu, Chuang
Yang, Nanhai
Xia, Feng
Li, Guangming
Kaveh-Yazdy, Fatemeh
author_facet Li, Fengqi
Yu, Chuang
Yang, Nanhai
Xia, Feng
Li, Guangming
Kaveh-Yazdy, Fatemeh
author_sort Li, Fengqi
collection PubMed
description Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.
format Online
Article
Text
id pubmed-3725769
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-37257692013-08-09 Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data Li, Fengqi Yu, Chuang Yang, Nanhai Xia, Feng Li, Guangming Kaveh-Yazdy, Fatemeh ScientificWorldJournal Research Article Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods. Hindawi Publishing Corporation 2013-07-10 /pmc/articles/PMC3725769/ /pubmed/23935439 http://dx.doi.org/10.1155/2013/875450 Text en Copyright © 2013 Fengqi Li et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Fengqi
Yu, Chuang
Yang, Nanhai
Xia, Feng
Li, Guangming
Kaveh-Yazdy, Fatemeh
Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_full Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_fullStr Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_full_unstemmed Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_short Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_sort iterative nearest neighborhood oversampling in semisupervised learning from imbalanced data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3725769/
https://www.ncbi.nlm.nih.gov/pubmed/23935439
http://dx.doi.org/10.1155/2013/875450
work_keys_str_mv AT lifengqi iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT yuchuang iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT yangnanhai iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT xiafeng iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT liguangming iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT kavehyazdyfatemeh iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata