Cargando…

Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method

To overcome the two-class imbalanced problem existing in the diagnosis of breast cancer, a hybrid of K-means and Boosted C5.0 (K-Boosted C5.0) is proposed which is based on undersampling. K-means is utilized to select the informative samples near the boundary. During the training phase, the K-means...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jue, Chen, Li, Abid, Fazeel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6817921/
https://www.ncbi.nlm.nih.gov/pubmed/31737241
http://dx.doi.org/10.1155/2019/7294582
_version_ 1783463525379735552
author Zhang, Jue
Chen, Li
Abid, Fazeel
author_facet Zhang, Jue
Chen, Li
Abid, Fazeel
author_sort Zhang, Jue
collection PubMed
description To overcome the two-class imbalanced problem existing in the diagnosis of breast cancer, a hybrid of K-means and Boosted C5.0 (K-Boosted C5.0) is proposed which is based on undersampling. K-means is utilized to select the informative samples near the boundary. During the training phase, the K-means algorithm clusters the majority and minority instances and selects a similar number of instances from each cluster. Boosted C5.0 is then used as the classifier. As there is one different instance selection factor via clustering that encourages the diversity of the training subspace in K-Boosted C5.0, it would be a great advantage to get better performance. To test the performance of the new hybrid classifier, it is implemented on 12 small-scale and 2 large-scale datasets, which are the often used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in terms of Matthews' correlation coefficient (MCC) and accuracy indices. It can be a good alternative to the well-known machine learning methods.
format Online
Article
Text
id pubmed-6817921
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-68179212019-11-17 Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method Zhang, Jue Chen, Li Abid, Fazeel J Healthc Eng Research Article To overcome the two-class imbalanced problem existing in the diagnosis of breast cancer, a hybrid of K-means and Boosted C5.0 (K-Boosted C5.0) is proposed which is based on undersampling. K-means is utilized to select the informative samples near the boundary. During the training phase, the K-means algorithm clusters the majority and minority instances and selects a similar number of instances from each cluster. Boosted C5.0 is then used as the classifier. As there is one different instance selection factor via clustering that encourages the diversity of the training subspace in K-Boosted C5.0, it would be a great advantage to get better performance. To test the performance of the new hybrid classifier, it is implemented on 12 small-scale and 2 large-scale datasets, which are the often used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in terms of Matthews' correlation coefficient (MCC) and accuracy indices. It can be a good alternative to the well-known machine learning methods. Hindawi 2019-10-16 /pmc/articles/PMC6817921/ /pubmed/31737241 http://dx.doi.org/10.1155/2019/7294582 Text en Copyright © 2019 Jue Zhang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Jue
Chen, Li
Abid, Fazeel
Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method
title Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method
title_full Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method
title_fullStr Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method
title_full_unstemmed Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method
title_short Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method
title_sort prediction of breast cancer from imbalance respect using cluster-based undersampling method
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6817921/
https://www.ncbi.nlm.nih.gov/pubmed/31737241
http://dx.doi.org/10.1155/2019/7294582
work_keys_str_mv AT zhangjue predictionofbreastcancerfromimbalancerespectusingclusterbasedundersamplingmethod
AT chenli predictionofbreastcancerfromimbalancerespectusingclusterbasedundersamplingmethod
AT abidfazeel predictionofbreastcancerfromimbalancerespectusingclusterbasedundersamplingmethod