Cargando…

Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer

BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Azzam, Nosayba, Shatnawi, Ibrahem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7806524/
https://www.ncbi.nlm.nih.gov/pubmed/33489117
http://dx.doi.org/10.1016/j.amsu.2020.12.043
_version_ 1783636542117380096
author Al-Azzam, Nosayba
Shatnawi, Ibrahem
author_facet Al-Azzam, Nosayba
Shatnawi, Ibrahem
author_sort Al-Azzam, Nosayba
collection PubMed
description BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves. RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%–98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98% CONCLUSION: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91–98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type.
format Online
Article
Text
id pubmed-7806524
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-78065242021-01-22 Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer Al-Azzam, Nosayba Shatnawi, Ibrahem Ann Med Surg (Lond) Original Research BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves. RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%–98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98% CONCLUSION: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91–98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type. Elsevier 2021-01-08 /pmc/articles/PMC7806524/ /pubmed/33489117 http://dx.doi.org/10.1016/j.amsu.2020.12.043 Text en © 2021 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Original Research
Al-Azzam, Nosayba
Shatnawi, Ibrahem
Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
title Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
title_full Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
title_fullStr Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
title_full_unstemmed Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
title_short Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
title_sort comparing supervised and semi-supervised machine learning models on diagnosing breast cancer
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7806524/
https://www.ncbi.nlm.nih.gov/pubmed/33489117
http://dx.doi.org/10.1016/j.amsu.2020.12.043
work_keys_str_mv AT alazzamnosayba comparingsupervisedandsemisupervisedmachinelearningmodelsondiagnosingbreastcancer
AT shatnawiibrahem comparingsupervisedandsemisupervisedmachinelearningmodelsondiagnosingbreastcancer