Cargando…

LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data

BACKGROUND: Over-sampling methods based on Synthetic Minority Over-sampling Technique (SMOTE) have been proposed for classification problems of imbalanced biomedical data. However, the existing over-sampling methods achieve slightly better or sometimes worse result than the simplest SMOTE. In order...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nakamura, Munehiro, Kajiwara, Yusuke, Otsuka, Atsushi, Kimura, Haruhiko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016036/ https://www.ncbi.nlm.nih.gov/pubmed/24088532 http://dx.doi.org/10.1186/1756-0381-6-16

_version_	1782315452747218944
author	Nakamura, Munehiro Kajiwara, Yusuke Otsuka, Atsushi Kimura, Haruhiko
author_facet	Nakamura, Munehiro Kajiwara, Yusuke Otsuka, Atsushi Kimura, Haruhiko
author_sort	Nakamura, Munehiro
collection	PubMed
description	BACKGROUND: Over-sampling methods based on Synthetic Minority Over-sampling Technique (SMOTE) have been proposed for classification problems of imbalanced biomedical data. However, the existing over-sampling methods achieve slightly better or sometimes worse result than the simplest SMOTE. In order to improve the effectiveness of SMOTE, this paper presents a novel over-sampling method using codebooks obtained by the learning vector quantization. In general, even when an existing SMOTE applied to a biomedical dataset, its empty feature space is still so huge that most classification algorithms would not perform well on estimating borderlines between classes. To tackle this problem, our over-sampling method generates synthetic samples which occupy more feature space than the other SMOTE algorithms. Briefly saying, our over-sampling method enables to generate useful synthetic samples by referring to actual samples taken from real-world datasets. RESULTS: Experiments on eight real-world imbalanced datasets demonstrate that our proposed over-sampling method performs better than the simplest SMOTE on four of five standard classification algorithms. Moreover, it is seen that the performance of our method increases if the latest SMOTE called MWMOTE is used in our algorithm. Experiments on datasets for β-turn types prediction show some important patterns that have not been seen in previous analyses. CONCLUSIONS: The proposed over-sampling method generates useful synthetic samples for the classification of imbalanced biomedical data. Besides, the proposed over-sampling method is basically compatible with basic classification algorithms and the existing over-sampling methods.
format	Online Article Text
id	pubmed-4016036
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40160362014-05-23 LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data Nakamura, Munehiro Kajiwara, Yusuke Otsuka, Atsushi Kimura, Haruhiko BioData Min Research BACKGROUND: Over-sampling methods based on Synthetic Minority Over-sampling Technique (SMOTE) have been proposed for classification problems of imbalanced biomedical data. However, the existing over-sampling methods achieve slightly better or sometimes worse result than the simplest SMOTE. In order to improve the effectiveness of SMOTE, this paper presents a novel over-sampling method using codebooks obtained by the learning vector quantization. In general, even when an existing SMOTE applied to a biomedical dataset, its empty feature space is still so huge that most classification algorithms would not perform well on estimating borderlines between classes. To tackle this problem, our over-sampling method generates synthetic samples which occupy more feature space than the other SMOTE algorithms. Briefly saying, our over-sampling method enables to generate useful synthetic samples by referring to actual samples taken from real-world datasets. RESULTS: Experiments on eight real-world imbalanced datasets demonstrate that our proposed over-sampling method performs better than the simplest SMOTE on four of five standard classification algorithms. Moreover, it is seen that the performance of our method increases if the latest SMOTE called MWMOTE is used in our algorithm. Experiments on datasets for β-turn types prediction show some important patterns that have not been seen in previous analyses. CONCLUSIONS: The proposed over-sampling method generates useful synthetic samples for the classification of imbalanced biomedical data. Besides, the proposed over-sampling method is basically compatible with basic classification algorithms and the existing over-sampling methods. BioMed Central 2013-10-02 /pmc/articles/PMC4016036/ /pubmed/24088532 http://dx.doi.org/10.1186/1756-0381-6-16 Text en Copyright © 2013 Nakamura et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Nakamura, Munehiro Kajiwara, Yusuke Otsuka, Atsushi Kimura, Haruhiko LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
title	LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
title_full	LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
title_fullStr	LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
title_full_unstemmed	LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
title_short	LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data
title_sort	lvq-smote – learning vector quantization based synthetic minority over–sampling technique for biomedical data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016036/ https://www.ncbi.nlm.nih.gov/pubmed/24088532 http://dx.doi.org/10.1186/1756-0381-6-16
work_keys_str_mv	AT nakamuramunehiro lvqsmotelearningvectorquantizationbasedsyntheticminorityoversamplingtechniqueforbiomedicaldata AT kajiwarayusuke lvqsmotelearningvectorquantizationbasedsyntheticminorityoversamplingtechniqueforbiomedicaldata AT otsukaatsushi lvqsmotelearningvectorquantizationbasedsyntheticminorityoversamplingtechniqueforbiomedicaldata AT kimuraharuhiko lvqsmotelearningvectorquantizationbasedsyntheticminorityoversamplingtechniqueforbiomedicaldata

LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling Technique for biomedical data

Ejemplares similares