Cargando…
Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning
OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel ma...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
West Asia Organization for Cancer Prevention
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9068177/ https://www.ncbi.nlm.nih.gov/pubmed/34837911 http://dx.doi.org/10.31557/APJCP.2021.22.11.3543 |
_version_ | 1784700170378674176 |
---|---|
author | Khozama, Sam Mayya, Ali Mahmoud |
author_facet | Khozama, Sam Mayya, Ali Mahmoud |
author_sort | Khozama, Sam |
collection | PubMed |
description | OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel machine learning tool for the early prediction of breast cancer. METHODS: Three basic resources are used to identify the most essential risk factors; including the BCSC (Breast Cancer Surveillance Consortium) dataset, a medical questionnaire, and multiple international breast cancer reports. The BCSC dataset has been normalized and balanced; consequently, the questionnaire and the medical reports are analyzed in order to define the degree of importance and a potential weight factor of each risk factor. These weights are used to scale risk factors and then the optimizable tree-based ML model is trained using the balanced weighted risk factors datasets. RESULTS: Three balanced versions of the BCSC dataset are used; oversampled, down-sampled and mixed datasets. Each risk factor has a weight (1, 2 or 4) assigned based on a mathematical modelling of the questionnaire and the international breast cancer reports. The experiments are applied on the weighted and non-weighted versions of the database, and they indicate that the performance increases significantly by using the weighted version of the risk factors. The tests prove that the down-weighting of the non-essential risk factor increases the accuracy and reduces errors. The overall accuracy of the weighted balanced datasets reaches 100%, 95.8% and 95.9% for down-sampled, oversampled and mixed datasets respectively. CONCLUSION: Weighting the risk factors of the BCSC dataset improves the performance by increasing the accuracy and reducing the false rejection and false discovery rates for all versions of balanced datasets. The weighting approach can also be used to improve the estimation score of breast cancer by scaling the individual scores of risk factors. |
format | Online Article Text |
id | pubmed-9068177 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | West Asia Organization for Cancer Prevention |
record_format | MEDLINE/PubMed |
spelling | pubmed-90681772022-05-06 Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning Khozama, Sam Mayya, Ali Mahmoud Asian Pac J Cancer Prev Research Article OBJECTIVE: Early prediction of breast cancer is one of the most essential fields of medicine. Many studies have introduced prediction approaches to facilitate the early prediction and estimate the future occurrence based on mammography periodic tests. In the current research, we introduce a novel machine learning tool for the early prediction of breast cancer. METHODS: Three basic resources are used to identify the most essential risk factors; including the BCSC (Breast Cancer Surveillance Consortium) dataset, a medical questionnaire, and multiple international breast cancer reports. The BCSC dataset has been normalized and balanced; consequently, the questionnaire and the medical reports are analyzed in order to define the degree of importance and a potential weight factor of each risk factor. These weights are used to scale risk factors and then the optimizable tree-based ML model is trained using the balanced weighted risk factors datasets. RESULTS: Three balanced versions of the BCSC dataset are used; oversampled, down-sampled and mixed datasets. Each risk factor has a weight (1, 2 or 4) assigned based on a mathematical modelling of the questionnaire and the international breast cancer reports. The experiments are applied on the weighted and non-weighted versions of the database, and they indicate that the performance increases significantly by using the weighted version of the risk factors. The tests prove that the down-weighting of the non-essential risk factor increases the accuracy and reduces errors. The overall accuracy of the weighted balanced datasets reaches 100%, 95.8% and 95.9% for down-sampled, oversampled and mixed datasets respectively. CONCLUSION: Weighting the risk factors of the BCSC dataset improves the performance by increasing the accuracy and reducing the false rejection and false discovery rates for all versions of balanced datasets. The weighting approach can also be used to improve the estimation score of breast cancer by scaling the individual scores of risk factors. West Asia Organization for Cancer Prevention 2021-11 /pmc/articles/PMC9068177/ /pubmed/34837911 http://dx.doi.org/10.31557/APJCP.2021.22.11.3543 Text en https://creativecommons.org/licenses/by/3.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License, (http://creativecommons.org/licenses/by/3.0/ (https://creativecommons.org/licenses/by/3.0/) ) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Khozama, Sam Mayya, Ali Mahmoud Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning |
title | Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning |
title_full | Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning |
title_fullStr | Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning |
title_full_unstemmed | Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning |
title_short | Study the Effect of the Risk Factors in the Estimation of the Breast Cancer Risk Score Using Machine Learning |
title_sort | study the effect of the risk factors in the estimation of the breast cancer risk score using machine learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9068177/ https://www.ncbi.nlm.nih.gov/pubmed/34837911 http://dx.doi.org/10.31557/APJCP.2021.22.11.3543 |
work_keys_str_mv | AT khozamasam studytheeffectoftheriskfactorsintheestimationofthebreastcancerriskscoreusingmachinelearning AT mayyaalimahmoud studytheeffectoftheriskfactorsintheestimationofthebreastcancerriskscoreusingmachinelearning |