Cargando…

An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data

BACKGROUND: Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer da...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Kung-Jeng, Makond, Bunjira, Wang, Kung-Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3829096/
https://www.ncbi.nlm.nih.gov/pubmed/24207108
http://dx.doi.org/10.1186/1472-6947-13-124
_version_ 1782291323271774208
author Wang, Kung-Jeng
Makond, Bunjira
Wang, Kung-Min
author_facet Wang, Kung-Jeng
Makond, Bunjira
Wang, Kung-Min
author_sort Wang, Kung-Jeng
collection PubMed
description BACKGROUND: Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. METHODS: Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE) ,cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. RESULTS: Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. CONCLUSIONS: LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR.
format Online
Article
Text
id pubmed-3829096
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38290962013-11-20 An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data Wang, Kung-Jeng Makond, Bunjira Wang, Kung-Min BMC Med Inform Decis Mak Research Article BACKGROUND: Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. METHODS: Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE) ,cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. RESULTS: Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. CONCLUSIONS: LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR. BioMed Central 2013-11-09 /pmc/articles/PMC3829096/ /pubmed/24207108 http://dx.doi.org/10.1186/1472-6947-13-124 Text en Copyright © 2013 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wang, Kung-Jeng
Makond, Bunjira
Wang, Kung-Min
An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
title An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
title_full An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
title_fullStr An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
title_full_unstemmed An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
title_short An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
title_sort improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3829096/
https://www.ncbi.nlm.nih.gov/pubmed/24207108
http://dx.doi.org/10.1186/1472-6947-13-124
work_keys_str_mv AT wangkungjeng animprovedsurvivabilityprognosisofbreastcancerbyusingsamplingandfeatureselectiontechniquetosolveimbalancedpatientclassificationdata
AT makondbunjira animprovedsurvivabilityprognosisofbreastcancerbyusingsamplingandfeatureselectiontechniquetosolveimbalancedpatientclassificationdata
AT wangkungmin animprovedsurvivabilityprognosisofbreastcancerbyusingsamplingandfeatureselectiontechniquetosolveimbalancedpatientclassificationdata
AT wangkungjeng improvedsurvivabilityprognosisofbreastcancerbyusingsamplingandfeatureselectiontechniquetosolveimbalancedpatientclassificationdata
AT makondbunjira improvedsurvivabilityprognosisofbreastcancerbyusingsamplingandfeatureselectiontechniquetosolveimbalancedpatientclassificationdata
AT wangkungmin improvedsurvivabilityprognosisofbreastcancerbyusingsamplingandfeatureselectiontechniquetosolveimbalancedpatientclassificationdata