Cargando…

Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information

PURPOSE: To evaluate the robustness of multiple machine learning classifiers for breast cancer risk estimation in the presence of incomplete or inaccurate information. DATA AND METHODS: Open data for this study was obtained from the BCSC Data Resource (http://breastscreening.cancer.gov/). We conduct...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kakileti, Siva Teja, Manjunath, Geetha, Dekker, Andre, Wee, Leonard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	West Asia Organization for Cancer Prevention 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771951/ https://www.ncbi.nlm.nih.gov/pubmed/32856859 http://dx.doi.org/10.31557/APJCP.2020.21.8.2307

_version_	1783629776022405120
author	Kakileti, Siva Teja Manjunath, Geetha Dekker, Andre Wee, Leonard
author_facet	Kakileti, Siva Teja Manjunath, Geetha Dekker, Andre Wee, Leonard
author_sort	Kakileti, Siva Teja
collection	PubMed
description	PURPOSE: To evaluate the robustness of multiple machine learning classifiers for breast cancer risk estimation in the presence of incomplete or inaccurate information. DATA AND METHODS: Open data for this study was obtained from the BCSC Data Resource (http://breastscreening.cancer.gov/). We conducted two ablation-type experiments to compare the robustness of different classifiers where we randomly switched known information to missing with a missing probability of p(m) in one experiment, and randomly corrupted the existing information with a probability of p(c) in another experiment. We considered three prominent machine-learning classifiers such as Logistic regression (LR), Random Forests (RF) and a custom Neural Network (NN) architecture and compared their degradation of discrimination performance as a function of increasing probability of missing or inaccurate data. RESULTS: LR, RF and custom NN resulted in an Area Under Curve (AUC) of 0.645, 0.643 and 0.649, respectively, on a test set with 500,000 total observations. When we manipulated the data by varying probabilities p(m) and p(c) from 0 to 1, NN resulted in better performance in terms of AUC compared to RF and LR as long as less than half the data was missing/inaccurate (that is, for values of p(m) < 0.5 and p(c) < 0.5). However, for missing (p(m)) or corruption (p(c)) probabilities above 0.5, LR gave similar performance as the custom NN. RF resulted in overall poorer performance when the data had additional missing or incorrect entries. CONCLUSION: In cases where the input information is missing or inaccurate, our experiments show that the proposed custom NN provides reliable risk estimates in medical datasets like BCSC. These results are particularly important in health care applications where not every attribute of the individual participant might be available.
format	Online Article Text
id	pubmed-7771951
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	West Asia Organization for Cancer Prevention
record_format	MEDLINE/PubMed
spelling	pubmed-77719512021-02-06 Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information Kakileti, Siva Teja Manjunath, Geetha Dekker, Andre Wee, Leonard Asian Pac J Cancer Prev Research Article PURPOSE: To evaluate the robustness of multiple machine learning classifiers for breast cancer risk estimation in the presence of incomplete or inaccurate information. DATA AND METHODS: Open data for this study was obtained from the BCSC Data Resource (http://breastscreening.cancer.gov/). We conducted two ablation-type experiments to compare the robustness of different classifiers where we randomly switched known information to missing with a missing probability of p(m) in one experiment, and randomly corrupted the existing information with a probability of p(c) in another experiment. We considered three prominent machine-learning classifiers such as Logistic regression (LR), Random Forests (RF) and a custom Neural Network (NN) architecture and compared their degradation of discrimination performance as a function of increasing probability of missing or inaccurate data. RESULTS: LR, RF and custom NN resulted in an Area Under Curve (AUC) of 0.645, 0.643 and 0.649, respectively, on a test set with 500,000 total observations. When we manipulated the data by varying probabilities p(m) and p(c) from 0 to 1, NN resulted in better performance in terms of AUC compared to RF and LR as long as less than half the data was missing/inaccurate (that is, for values of p(m) < 0.5 and p(c) < 0.5). However, for missing (p(m)) or corruption (p(c)) probabilities above 0.5, LR gave similar performance as the custom NN. RF resulted in overall poorer performance when the data had additional missing or incorrect entries. CONCLUSION: In cases where the input information is missing or inaccurate, our experiments show that the proposed custom NN provides reliable risk estimates in medical datasets like BCSC. These results are particularly important in health care applications where not every attribute of the individual participant might be available. West Asia Organization for Cancer Prevention 2020-08 /pmc/articles/PMC7771951/ /pubmed/32856859 http://dx.doi.org/10.31557/APJCP.2020.21.8.2307 Text en This is an Open Access article distributed under the terms of the Creative Commons Attribution License, (http://creativecommons.org/licenses/by/3.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Kakileti, Siva Teja Manjunath, Geetha Dekker, Andre Wee, Leonard Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information
title	Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information
title_full	Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information
title_fullStr	Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information
title_full_unstemmed	Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information
title_short	Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information
title_sort	robust estimation of breast cancer incidence risk in presence of incomplete or inaccurate information
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771951/ https://www.ncbi.nlm.nih.gov/pubmed/32856859 http://dx.doi.org/10.31557/APJCP.2020.21.8.2307
work_keys_str_mv	AT kakiletisivateja robustestimationofbreastcancerincidenceriskinpresenceofincompleteorinaccurateinformation AT manjunathgeetha robustestimationofbreastcancerincidenceriskinpresenceofincompleteorinaccurateinformation AT dekkerandre robustestimationofbreastcancerincidenceriskinpresenceofincompleteorinaccurateinformation AT weeleonard robustestimationofbreastcancerincidenceriskinpresenceofincompleteorinaccurateinformation

Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information

Ejemplares similares