Cargando…
Predicting breast cancer risk using personal health data and machine learning models
Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistica...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6934281/ https://www.ncbi.nlm.nih.gov/pubmed/31881042 http://dx.doi.org/10.1371/journal.pone.0226765 |
_version_ | 1783483356895248384 |
---|---|
author | Stark, Gigi F. Hart, Gregory R. Nartowt, Bradley J. Deng, Jun |
author_facet | Stark, Gigi F. Hart, Gregory R. Nartowt, Bradley J. Deng, Jun |
author_sort | Stark, Gigi F. |
collection | PubMed |
description | Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model’s performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention. |
format | Online Article Text |
id | pubmed-6934281 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-69342812020-01-07 Predicting breast cancer risk using personal health data and machine learning models Stark, Gigi F. Hart, Gregory R. Nartowt, Bradley J. Deng, Jun PLoS One Research Article Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model’s performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention. Public Library of Science 2019-12-27 /pmc/articles/PMC6934281/ /pubmed/31881042 http://dx.doi.org/10.1371/journal.pone.0226765 Text en © 2019 Stark et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Stark, Gigi F. Hart, Gregory R. Nartowt, Bradley J. Deng, Jun Predicting breast cancer risk using personal health data and machine learning models |
title | Predicting breast cancer risk using personal health data and machine learning models |
title_full | Predicting breast cancer risk using personal health data and machine learning models |
title_fullStr | Predicting breast cancer risk using personal health data and machine learning models |
title_full_unstemmed | Predicting breast cancer risk using personal health data and machine learning models |
title_short | Predicting breast cancer risk using personal health data and machine learning models |
title_sort | predicting breast cancer risk using personal health data and machine learning models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6934281/ https://www.ncbi.nlm.nih.gov/pubmed/31881042 http://dx.doi.org/10.1371/journal.pone.0226765 |
work_keys_str_mv | AT starkgigif predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels AT hartgregoryr predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels AT nartowtbradleyj predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels AT dengjun predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels |