Cargando…

Predicting breast cancer risk using personal health data and machine learning models

Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistica...

Descripción completa

Detalles Bibliográficos
Autores principales: Stark, Gigi F., Hart, Gregory R., Nartowt, Bradley J., Deng, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6934281/
https://www.ncbi.nlm.nih.gov/pubmed/31881042
http://dx.doi.org/10.1371/journal.pone.0226765
_version_ 1783483356895248384
author Stark, Gigi F.
Hart, Gregory R.
Nartowt, Bradley J.
Deng, Jun
author_facet Stark, Gigi F.
Hart, Gregory R.
Nartowt, Bradley J.
Deng, Jun
author_sort Stark, Gigi F.
collection PubMed
description Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model’s performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention.
format Online
Article
Text
id pubmed-6934281
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-69342812020-01-07 Predicting breast cancer risk using personal health data and machine learning models Stark, Gigi F. Hart, Gregory R. Nartowt, Bradley J. Deng, Jun PLoS One Research Article Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model’s performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention. Public Library of Science 2019-12-27 /pmc/articles/PMC6934281/ /pubmed/31881042 http://dx.doi.org/10.1371/journal.pone.0226765 Text en © 2019 Stark et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Stark, Gigi F.
Hart, Gregory R.
Nartowt, Bradley J.
Deng, Jun
Predicting breast cancer risk using personal health data and machine learning models
title Predicting breast cancer risk using personal health data and machine learning models
title_full Predicting breast cancer risk using personal health data and machine learning models
title_fullStr Predicting breast cancer risk using personal health data and machine learning models
title_full_unstemmed Predicting breast cancer risk using personal health data and machine learning models
title_short Predicting breast cancer risk using personal health data and machine learning models
title_sort predicting breast cancer risk using personal health data and machine learning models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6934281/
https://www.ncbi.nlm.nih.gov/pubmed/31881042
http://dx.doi.org/10.1371/journal.pone.0226765
work_keys_str_mv AT starkgigif predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
AT hartgregoryr predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
AT nartowtbradleyj predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
AT dengjun predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels