Cargando…

Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol

Background: Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examina...

Descripción completa

Detalles Bibliográficos
Autores principales: Salod, Zakia, Singh, Yashik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PAGEPress Publications, Pavia, Italy 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902303/
https://www.ncbi.nlm.nih.gov/pubmed/31857990
http://dx.doi.org/10.4081/jphr.2019.1677
_version_ 1783477638570967040
author Salod, Zakia
Singh, Yashik
author_facet Salod, Zakia
Singh, Yashik
author_sort Salod, Zakia
collection PubMed
description Background: Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examination and clinical breast examination. The prominent gold standard for BC detection is triple assessment: i) clinical examination, ii) mammography and/or ultrasonography; and iii) Fine Needle Aspirate Cytology. However, the introduction of cheaper, efficient and noninvasive methods of BC screening and detection would be beneficial. Design and methods: We propose the use of eight machine learning algorithms: i) Logistic Regression; ii) Support Vector Machine; iii) K-Nearest Neighbors; iv) Decision Tree; v) Random Forest; vi) Adaptive Boosting; vii) Gradient Boosting; viii) eXtreme Gradient Boosting, and blood test results using BC Coimbra Dataset (BCCD) from University of California Irvine online database to create models for BC prediction. To ensure the models’ robustness, we will employ: i) Stratified k-fold Cross- Validation; ii) Correlation-based Feature Selection (CFS); and iii) parameter tuning. The models will be validated on validation and test sets of BCCD for full features and reduced features. Feature reduction has an impact on algorithm performance. Seven metrics will be used for model evaluation, including accuracy. Expected impact of the study for public health: The CFS together with highest performing model(s) can serve to identify important specific blood tests that point towards BC, which may serve as an important BC biomarker. Highest performing model(s) may eventually be used to create an Artificial Intelligence tool to assist clinicians in BC screening and detection.
format Online
Article
Text
id pubmed-6902303
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PAGEPress Publications, Pavia, Italy
record_format MEDLINE/PubMed
spelling pubmed-69023032019-12-19 Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol Salod, Zakia Singh, Yashik J Public Health Res Study Protocol Background: Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examination and clinical breast examination. The prominent gold standard for BC detection is triple assessment: i) clinical examination, ii) mammography and/or ultrasonography; and iii) Fine Needle Aspirate Cytology. However, the introduction of cheaper, efficient and noninvasive methods of BC screening and detection would be beneficial. Design and methods: We propose the use of eight machine learning algorithms: i) Logistic Regression; ii) Support Vector Machine; iii) K-Nearest Neighbors; iv) Decision Tree; v) Random Forest; vi) Adaptive Boosting; vii) Gradient Boosting; viii) eXtreme Gradient Boosting, and blood test results using BC Coimbra Dataset (BCCD) from University of California Irvine online database to create models for BC prediction. To ensure the models’ robustness, we will employ: i) Stratified k-fold Cross- Validation; ii) Correlation-based Feature Selection (CFS); and iii) parameter tuning. The models will be validated on validation and test sets of BCCD for full features and reduced features. Feature reduction has an impact on algorithm performance. Seven metrics will be used for model evaluation, including accuracy. Expected impact of the study for public health: The CFS together with highest performing model(s) can serve to identify important specific blood tests that point towards BC, which may serve as an important BC biomarker. Highest performing model(s) may eventually be used to create an Artificial Intelligence tool to assist clinicians in BC screening and detection. PAGEPress Publications, Pavia, Italy 2019-12-04 /pmc/articles/PMC6902303/ /pubmed/31857990 http://dx.doi.org/10.4081/jphr.2019.1677 Text en ©Copyright: the Author(s), 2019 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (by-nc 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
spellingShingle Study Protocol
Salod, Zakia
Singh, Yashik
Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
title Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
title_full Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
title_fullStr Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
title_full_unstemmed Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
title_short Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
title_sort comparison of the performance of machine learning algorithms in breast cancer screening and detection: a protocol
topic Study Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902303/
https://www.ncbi.nlm.nih.gov/pubmed/31857990
http://dx.doi.org/10.4081/jphr.2019.1677
work_keys_str_mv AT salodzakia comparisonoftheperformanceofmachinelearningalgorithmsinbreastcancerscreeninganddetectionaprotocol
AT singhyashik comparisonoftheperformanceofmachinelearningalgorithmsinbreastcancerscreeninganddetectionaprotocol