Cargando…

Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning

OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Khozama, Sam, Mayya, Ali Mahmoud
Formato: Online Artículo Texto
Lenguaje:English
Publicado: West Asia Organization for Cancer Prevention 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9068177/
https://www.ncbi.nlm.nih.gov/pubmed/34837911
http://dx.doi.org/10.31557/APJCP.2021.22.11.3543
_version_ 1784700170378674176
author Khozama, Sam
Mayya, Ali Mahmoud
author_facet Khozama, Sam
Mayya, Ali Mahmoud
author_sort Khozama, Sam
collection PubMed
description OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel machine learning tool for the early prediction of breast cancer. METHODS: Three basic resources are used to identify the most essential risk factors; including the BCSC (Breast Cancer Surveillance Consortium) dataset, a medical questionnaire, and multiple international breast cancer reports. The BCSC dataset has been normalized and balanced; consequently, the questionnaire and the medical reports are analyzed in order to define the degree of importance and a potential weight factor of each risk factor. These weights are used to scale risk factors and then the optimizable tree-based ML model is trained using the balanced weighted risk factors datasets. RESULTS: Three balanced versions of the BCSC dataset are used; oversampled, down-sampled and mixed datasets. Each risk factor has a weight (1, 2 or 4) assigned based on a mathematical modelling of the questionnaire and the international breast cancer reports. The experiments are applied on the weighted and non-weighted versions of the database, and they indicate that the performance increases significantly by using the weighted version of the risk factors. The tests prove that the down-weighting of the non-essential risk factor increases the accuracy and reduces errors. The overall accuracy of the weighted balanced datasets reaches 100%, 95.8% and 95.9% for down-sampled, oversampled and mixed datasets respectively. CONCLUSION: Weighting the risk factors of the BCSC dataset improves the performance by increasing the accuracy and reducing the false rejection and false discovery rates for all versions of balanced datasets. The weighting approach can also be used to improve the estimation score of breast cancer by scaling the individual scores of risk factors.
format Online
Article
Text
id pubmed-9068177
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher West Asia Organization for Cancer Prevention
record_format MEDLINE/PubMed
spelling pubmed-90681772022-05-06 Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning Khozama, Sam Mayya, Ali Mahmoud Asian Pac J Cancer Prev Research Article OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel machine learning tool for the early prediction of breast cancer. METHODS: Three basic resources are used to identify the most essential risk factors; including the BCSC (Breast Cancer Surveillance Consortium) dataset, a medical questionnaire, and multiple international breast cancer reports. The BCSC dataset has been normalized and balanced; consequently, the questionnaire and the medical reports are analyzed in order to define the degree of importance and a potential weight factor of each risk factor. These weights are used to scale risk factors and then the optimizable tree-based ML model is trained using the balanced weighted risk factors datasets. RESULTS: Three balanced versions of the BCSC dataset are used; oversampled, down-sampled and mixed datasets. Each risk factor has a weight (1, 2 or 4) assigned based on a mathematical modelling of the questionnaire and the international breast cancer reports. The experiments are applied on the weighted and non-weighted versions of the database, and they indicate that the performance increases significantly by using the weighted version of the risk factors. The tests prove that the down-weighting of the non-essential risk factor increases the accuracy and reduces errors. The overall accuracy of the weighted balanced datasets reaches 100%, 95.8% and 95.9% for down-sampled, oversampled and mixed datasets respectively. CONCLUSION: Weighting the risk factors of the BCSC dataset improves the performance by increasing the accuracy and reducing the false rejection and false discovery rates for all versions of balanced datasets. The weighting approach can also be used to improve the estimation score of breast cancer by scaling the individual scores of risk factors. West Asia Organization for Cancer Prevention 2021-11 /pmc/articles/PMC9068177/ /pubmed/34837911 http://dx.doi.org/10.31557/APJCP.2021.22.11.3543 Text en https://creativecommons.org/licenses/by/3.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License, (http://creativecommons.org/licenses/by/3.0/ (https://creativecommons.org/licenses/by/3.0/) ) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Khozama, Sam
Mayya, Ali Mahmoud
Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
title Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
title_full Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
title_fullStr Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
title_full_unstemmed Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
title_short Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
title_sort study the effect of the risk factors in the estimation of the breast cancer risk score using machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9068177/
https://www.ncbi.nlm.nih.gov/pubmed/34837911
http://dx.doi.org/10.31557/APJCP.2021.22.11.3543
work_keys_str_mv AT khozamasam studytheeffectoftheriskfactorsintheestimationofthebreastcancerriskscoreusingmachinelearning
AT mayyaalimahmoud studytheeffectoftheriskfactorsintheestimationofthebreastcancerriskscoreusingmachinelearning