Cargando…

Analysis of Breast Cancer Detection Using Different Machine Learning Techniques

Data mining algorithms play an important role in the prediction of early-stage breast cancer. In this paper, we propose an approach that improves the accuracy and enhances the performance of three different classifiers: Decision Tree (J48), Naïve Bayes (NB), and Sequential Minimal Optimization (SMO)...

Descripción completa

Detalles Bibliográficos
Autores principales: Mohammed, Siham A., Darrab, Sadeq, Noaman, Salah A., Saake, Gunter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7351679/
http://dx.doi.org/10.1007/978-981-15-7205-0_10
_version_ 1783557486284898304
author Mohammed, Siham A.
Darrab, Sadeq
Noaman, Salah A.
Saake, Gunter
author_facet Mohammed, Siham A.
Darrab, Sadeq
Noaman, Salah A.
Saake, Gunter
author_sort Mohammed, Siham A.
collection PubMed
description Data mining algorithms play an important role in the prediction of early-stage breast cancer. In this paper, we propose an approach that improves the accuracy and enhances the performance of three different classifiers: Decision Tree (J48), Naïve Bayes (NB), and Sequential Minimal Optimization (SMO). We also validate and compare the classifiers on two benchmark datasets: Wisconsin Breast Cancer (WBC) and Breast Cancer dataset. Data with imbalanced classes are a big problem in the classification phase since the probability of instances belonging to the majority class is significantly high, the algorithms are much more likely to classify new observations to the majority class. We address such problem in this work. We use the data level approach which consists of resampling the data in order to mitigate the effect caused by class imbalance. For evaluation, 10 fold cross-validation is performed. The efficiency of each classifier is assessed in terms of true positive, false positive, Roc curve, standard deviation (Std), and accuracy (AC). Experiments show that using a resample filter enhances the classifier’s performance where SMO outperforms others in the WBC dataset and J48 is superior to others in the Breast Cancer dataset.
format Online
Article
Text
id pubmed-7351679
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73516792020-07-13 Analysis of Breast Cancer Detection Using Different Machine Learning Techniques Mohammed, Siham A. Darrab, Sadeq Noaman, Salah A. Saake, Gunter Data Mining and Big Data Article Data mining algorithms play an important role in the prediction of early-stage breast cancer. In this paper, we propose an approach that improves the accuracy and enhances the performance of three different classifiers: Decision Tree (J48), Naïve Bayes (NB), and Sequential Minimal Optimization (SMO). We also validate and compare the classifiers on two benchmark datasets: Wisconsin Breast Cancer (WBC) and Breast Cancer dataset. Data with imbalanced classes are a big problem in the classification phase since the probability of instances belonging to the majority class is significantly high, the algorithms are much more likely to classify new observations to the majority class. We address such problem in this work. We use the data level approach which consists of resampling the data in order to mitigate the effect caused by class imbalance. For evaluation, 10 fold cross-validation is performed. The efficiency of each classifier is assessed in terms of true positive, false positive, Roc curve, standard deviation (Std), and accuracy (AC). Experiments show that using a resample filter enhances the classifier’s performance where SMO outperforms others in the WBC dataset and J48 is superior to others in the Breast Cancer dataset. 2020-07-11 /pmc/articles/PMC7351679/ http://dx.doi.org/10.1007/978-981-15-7205-0_10 Text en © Springer Nature Singapore Pte Ltd. 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Mohammed, Siham A.
Darrab, Sadeq
Noaman, Salah A.
Saake, Gunter
Analysis of Breast Cancer Detection Using Different Machine Learning Techniques
title Analysis of Breast Cancer Detection Using Different Machine Learning Techniques
title_full Analysis of Breast Cancer Detection Using Different Machine Learning Techniques
title_fullStr Analysis of Breast Cancer Detection Using Different Machine Learning Techniques
title_full_unstemmed Analysis of Breast Cancer Detection Using Different Machine Learning Techniques
title_short Analysis of Breast Cancer Detection Using Different Machine Learning Techniques
title_sort analysis of breast cancer detection using different machine learning techniques
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7351679/
http://dx.doi.org/10.1007/978-981-15-7205-0_10
work_keys_str_mv AT mohammedsihama analysisofbreastcancerdetectionusingdifferentmachinelearningtechniques
AT darrabsadeq analysisofbreastcancerdetectionusingdifferentmachinelearningtechniques
AT noamansalaha analysisofbreastcancerdetectionusingdifferentmachinelearningtechniques
AT saakegunter analysisofbreastcancerdetectionusingdifferentmachinelearningtechniques