Cargando…
Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer
BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7806524/ https://www.ncbi.nlm.nih.gov/pubmed/33489117 http://dx.doi.org/10.1016/j.amsu.2020.12.043 |
_version_ | 1783636542117380096 |
---|---|
author | Al-Azzam, Nosayba Shatnawi, Ibrahem |
author_facet | Al-Azzam, Nosayba Shatnawi, Ibrahem |
author_sort | Al-Azzam, Nosayba |
collection | PubMed |
description | BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves. RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%–98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98% CONCLUSION: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91–98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type. |
format | Online Article Text |
id | pubmed-7806524 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-78065242021-01-22 Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer Al-Azzam, Nosayba Shatnawi, Ibrahem Ann Med Surg (Lond) Original Research BACKGROUND: Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women. OBJECTIVES: To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction. MATERIALS AND METHODS: We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves. RESULTS: The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%–98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98% CONCLUSION: The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91–98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type. Elsevier 2021-01-08 /pmc/articles/PMC7806524/ /pubmed/33489117 http://dx.doi.org/10.1016/j.amsu.2020.12.043 Text en © 2021 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Original Research Al-Azzam, Nosayba Shatnawi, Ibrahem Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer |
title | Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer |
title_full | Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer |
title_fullStr | Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer |
title_full_unstemmed | Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer |
title_short | Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer |
title_sort | comparing supervised and semi-supervised machine learning models on diagnosing breast cancer |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7806524/ https://www.ncbi.nlm.nih.gov/pubmed/33489117 http://dx.doi.org/10.1016/j.amsu.2020.12.043 |
work_keys_str_mv | AT alazzamnosayba comparingsupervisedandsemisupervisedmachinelearningmodelsondiagnosingbreastcancer AT shatnawiibrahem comparingsupervisedandsemisupervisedmachinelearningmodelsondiagnosingbreastcancer |