Cargando…

Classification Prediction of Breast Cancer Based on Machine Learning

Breast cancer is the most common and deadly type of cancer in the world. Based on machine learning algorithms such as XGBoost, random forest, logistic regression, and K-nearest neighbor, this paper establishes different models to classify and predict breast cancer, so as to provide a reference for t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Hua, Wang, Nan, Du, Xueping, Mei, Kehui, Zhou, Yuan, Cai, Guangxing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9848804/ https://www.ncbi.nlm.nih.gov/pubmed/36688223 http://dx.doi.org/10.1155/2023/6530719

_version_	1784871792816422912
author	Chen, Hua Wang, Nan Du, Xueping Mei, Kehui Zhou, Yuan Cai, Guangxing
author_facet	Chen, Hua Wang, Nan Du, Xueping Mei, Kehui Zhou, Yuan Cai, Guangxing
author_sort	Chen, Hua
collection	PubMed
description	Breast cancer is the most common and deadly type of cancer in the world. Based on machine learning algorithms such as XGBoost, random forest, logistic regression, and K-nearest neighbor, this paper establishes different models to classify and predict breast cancer, so as to provide a reference for the early diagnosis of breast cancer. Recall indicates the probability of detecting malignant cancer cells in medical diagnosis, which is of great significance for the classification of breast cancer, so this article takes recall as the primary evaluation index and considers the precision, accuracy, and F1-score evaluation indicators to evaluate and compare the prediction effect of each model. In order to eliminate the influence of different dimensional concepts on the effect of the model, the data are standardized. In order to find the optimal subset and improve the accuracy of the model, 15 features were screened out as input to the model through the Pearson correlation test. The K-nearest neighbor model uses the cross-validation method to select the optimal k value by using recall as an evaluation index. For the problem of positive and negative sample imbalance, the hierarchical sampling method is used to extract the training set and test set proportionally according to different categories. The experimental results show that under different dataset division (8 : 2 and 7 : 3), the prediction effect of the same model will have different changes. Comparative analysis shows that the XGBoost model established in this paper (which divides the training set and test set by 8 : 2) has better effects, and its recall, precision, accuracy, and F1-score are 1.00, 0.960, 0.974, and 0.980, respectively.
format	Online Article Text
id	pubmed-9848804
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-98488042023-01-19 Classification Prediction of Breast Cancer Based on Machine Learning Chen, Hua Wang, Nan Du, Xueping Mei, Kehui Zhou, Yuan Cai, Guangxing Comput Intell Neurosci Research Article Breast cancer is the most common and deadly type of cancer in the world. Based on machine learning algorithms such as XGBoost, random forest, logistic regression, and K-nearest neighbor, this paper establishes different models to classify and predict breast cancer, so as to provide a reference for the early diagnosis of breast cancer. Recall indicates the probability of detecting malignant cancer cells in medical diagnosis, which is of great significance for the classification of breast cancer, so this article takes recall as the primary evaluation index and considers the precision, accuracy, and F1-score evaluation indicators to evaluate and compare the prediction effect of each model. In order to eliminate the influence of different dimensional concepts on the effect of the model, the data are standardized. In order to find the optimal subset and improve the accuracy of the model, 15 features were screened out as input to the model through the Pearson correlation test. The K-nearest neighbor model uses the cross-validation method to select the optimal k value by using recall as an evaluation index. For the problem of positive and negative sample imbalance, the hierarchical sampling method is used to extract the training set and test set proportionally according to different categories. The experimental results show that under different dataset division (8 : 2 and 7 : 3), the prediction effect of the same model will have different changes. Comparative analysis shows that the XGBoost model established in this paper (which divides the training set and test set by 8 : 2) has better effects, and its recall, precision, accuracy, and F1-score are 1.00, 0.960, 0.974, and 0.980, respectively. Hindawi 2023-01-11 /pmc/articles/PMC9848804/ /pubmed/36688223 http://dx.doi.org/10.1155/2023/6530719 Text en Copyright © 2023 Hua Chen et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Chen, Hua Wang, Nan Du, Xueping Mei, Kehui Zhou, Yuan Cai, Guangxing Classification Prediction of Breast Cancer Based on Machine Learning
title	Classification Prediction of Breast Cancer Based on Machine Learning
title_full	Classification Prediction of Breast Cancer Based on Machine Learning
title_fullStr	Classification Prediction of Breast Cancer Based on Machine Learning
title_full_unstemmed	Classification Prediction of Breast Cancer Based on Machine Learning
title_short	Classification Prediction of Breast Cancer Based on Machine Learning
title_sort	classification prediction of breast cancer based on machine learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9848804/ https://www.ncbi.nlm.nih.gov/pubmed/36688223 http://dx.doi.org/10.1155/2023/6530719
work_keys_str_mv	AT chenhua classificationpredictionofbreastcancerbasedonmachinelearning AT wangnan classificationpredictionofbreastcancerbasedonmachinelearning AT duxueping classificationpredictionofbreastcancerbasedonmachinelearning AT meikehui classificationpredictionofbreastcancerbasedonmachinelearning AT zhouyuan classificationpredictionofbreastcancerbasedonmachinelearning AT caiguangxing classificationpredictionofbreastcancerbasedonmachinelearning

Classification Prediction of Breast Cancer Based on Machine Learning

Ejemplares similares