Cargando…

A particle swarm based hybrid system for imbalanced medical data sampling

BACKGROUND: Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Pengyi, Xu, Liang, Zhou, Bing B, Zhang, Zili, Zomaya, Albert Y
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788388/
https://www.ncbi.nlm.nih.gov/pubmed/19958499
http://dx.doi.org/10.1186/1471-2164-10-S3-S34
_version_ 1782174970356432896
author Yang, Pengyi
Xu, Liang
Zhou, Bing B
Zhang, Zili
Zomaya, Albert Y
author_facet Yang, Pengyi
Xu, Liang
Zhou, Bing B
Zhang, Zili
Zomaya, Albert Y
author_sort Yang, Pengyi
collection PubMed
description BACKGROUND: Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset. RESULTS: One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system. CONCLUSION: The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.
format Text
id pubmed-2788388
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27883882009-12-04 A particle swarm based hybrid system for imbalanced medical data sampling Yang, Pengyi Xu, Liang Zhou, Bing B Zhang, Zili Zomaya, Albert Y BMC Genomics Proceedings BACKGROUND: Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset. RESULTS: One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system. CONCLUSION: The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes. BioMed Central 2009-12-03 /pmc/articles/PMC2788388/ /pubmed/19958499 http://dx.doi.org/10.1186/1471-2164-10-S3-S34 Text en Copyright ©2009 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Yang, Pengyi
Xu, Liang
Zhou, Bing B
Zhang, Zili
Zomaya, Albert Y
A particle swarm based hybrid system for imbalanced medical data sampling
title A particle swarm based hybrid system for imbalanced medical data sampling
title_full A particle swarm based hybrid system for imbalanced medical data sampling
title_fullStr A particle swarm based hybrid system for imbalanced medical data sampling
title_full_unstemmed A particle swarm based hybrid system for imbalanced medical data sampling
title_short A particle swarm based hybrid system for imbalanced medical data sampling
title_sort particle swarm based hybrid system for imbalanced medical data sampling
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788388/
https://www.ncbi.nlm.nih.gov/pubmed/19958499
http://dx.doi.org/10.1186/1471-2164-10-S3-S34
work_keys_str_mv AT yangpengyi aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT xuliang aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhoubingb aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhangzili aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zomayaalberty aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT yangpengyi particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT xuliang particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhoubingb particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhangzili particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zomayaalberty particleswarmbasedhybridsystemforimbalancedmedicaldatasampling