Cargando…

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study

OBJECTIVE: To develop a clinically useful model that estimates the 10 year risk of breast cancer related mortality in women (self-reported female sex) with breast cancer of any stage, comparing results from regression and machine learning approaches. DESIGN: Population based cohort study. SETTING: Q...

Descripción completa

Detalles Bibliográficos
Autores principales:	Clift, Ash Kieran, Dodwell, David, Lord, Simon, Petrou, Stavros, Brady, Michael, Collins, Gary S, Hippisley-Cox, Julia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BMJ Publishing Group Ltd. 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10170264/ https://www.ncbi.nlm.nih.gov/pubmed/37164379 http://dx.doi.org/10.1136/bmj-2022-073800

_version_	1785039191785078784
author	Clift, Ash Kieran Dodwell, David Lord, Simon Petrou, Stavros Brady, Michael Collins, Gary S Hippisley-Cox, Julia
author_facet	Clift, Ash Kieran Dodwell, David Lord, Simon Petrou, Stavros Brady, Michael Collins, Gary S Hippisley-Cox, Julia
author_sort	Clift, Ash Kieran
collection	PubMed
description	OBJECTIVE: To develop a clinically useful model that estimates the 10 year risk of breast cancer related mortality in women (self-reported female sex) with breast cancer of any stage, comparing results from regression and machine learning approaches. DESIGN: Population based cohort study. SETTING: QResearch primary care database in England, with individual level linkage to the national cancer registry, Hospital Episodes Statistics, and national mortality registers. PARTICIPANTS: 141 765 women aged 20 years and older with a diagnosis of invasive breast cancer between 1 January 2000 and 31 December 2020. MAIN OUTCOME MEASURES: Four model building strategies comprising two regression (Cox proportional hazards and competing risks regression) and two machine learning (XGBoost and an artificial neural network) approaches. Internal-external cross validation was used for model evaluation. Random effects meta-analysis that pooled estimates of discrimination and calibration metrics, calibration plots, and decision curve analysis were used to assess model performance, transportability, and clinical utility. RESULTS: During a median 4.16 years (interquartile range 1.76-8.26) of follow-up, 21 688 breast cancer related deaths and 11 454 deaths from other causes occurred. Restricting to 10 years maximum follow-up from breast cancer diagnosis, 20 367 breast cancer related deaths occurred during a total of 688 564.81 person years. The crude breast cancer mortality rate was 295.79 per 10 000 person years (95% confidence interval 291.75 to 299.88). Predictors varied for each regression model, but both Cox and competing risks models included age at diagnosis, body mass index, smoking status, route to diagnosis, hormone receptor status, cancer stage, and grade of breast cancer. The Cox model’s random effects meta-analysis pooled estimate for Harrell’s C index was the highest of any model at 0.858 (95% confidence interval 0.853 to 0.864, and 95% prediction interval 0.843 to 0.873). It appeared acceptably calibrated on calibration plots. The competing risks regression model had good discrimination: pooled Harrell’s C index 0.849 (0.839 to 0.859, and 0.821 to 0.876, and evidence of systematic miscalibration on summary metrics was lacking. The machine learning models had acceptable discrimination overall (Harrell’s C index: XGBoost 0.821 (0.813 to 0.828, and 0.805 to 0.837); neural network 0.847 (0.835 to 0.858, and 0.816 to 0.878)), but had more complex patterns of miscalibration and more variable regional and stage specific performance. Decision curve analysis suggested that the Cox and competing risks regression models tested may have higher clinical utility than the two machine learning approaches. CONCLUSION: In women with breast cancer of any stage, using the predictors available in this dataset, regression based methods had better and more consistent performance compared with machine learning approaches and may be worthy of further evaluation for potential clinical use, such as for stratified follow-up.
format	Online Article Text
id	pubmed-10170264
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BMJ Publishing Group Ltd.
record_format	MEDLINE/PubMed
spelling	pubmed-101702642023-05-11 Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study Clift, Ash Kieran Dodwell, David Lord, Simon Petrou, Stavros Brady, Michael Collins, Gary S Hippisley-Cox, Julia BMJ Research OBJECTIVE: To develop a clinically useful model that estimates the 10 year risk of breast cancer related mortality in women (self-reported female sex) with breast cancer of any stage, comparing results from regression and machine learning approaches. DESIGN: Population based cohort study. SETTING: QResearch primary care database in England, with individual level linkage to the national cancer registry, Hospital Episodes Statistics, and national mortality registers. PARTICIPANTS: 141 765 women aged 20 years and older with a diagnosis of invasive breast cancer between 1 January 2000 and 31 December 2020. MAIN OUTCOME MEASURES: Four model building strategies comprising two regression (Cox proportional hazards and competing risks regression) and two machine learning (XGBoost and an artificial neural network) approaches. Internal-external cross validation was used for model evaluation. Random effects meta-analysis that pooled estimates of discrimination and calibration metrics, calibration plots, and decision curve analysis were used to assess model performance, transportability, and clinical utility. RESULTS: During a median 4.16 years (interquartile range 1.76-8.26) of follow-up, 21 688 breast cancer related deaths and 11 454 deaths from other causes occurred. Restricting to 10 years maximum follow-up from breast cancer diagnosis, 20 367 breast cancer related deaths occurred during a total of 688 564.81 person years. The crude breast cancer mortality rate was 295.79 per 10 000 person years (95% confidence interval 291.75 to 299.88). Predictors varied for each regression model, but both Cox and competing risks models included age at diagnosis, body mass index, smoking status, route to diagnosis, hormone receptor status, cancer stage, and grade of breast cancer. The Cox model’s random effects meta-analysis pooled estimate for Harrell’s C index was the highest of any model at 0.858 (95% confidence interval 0.853 to 0.864, and 95% prediction interval 0.843 to 0.873). It appeared acceptably calibrated on calibration plots. The competing risks regression model had good discrimination: pooled Harrell’s C index 0.849 (0.839 to 0.859, and 0.821 to 0.876, and evidence of systematic miscalibration on summary metrics was lacking. The machine learning models had acceptable discrimination overall (Harrell’s C index: XGBoost 0.821 (0.813 to 0.828, and 0.805 to 0.837); neural network 0.847 (0.835 to 0.858, and 0.816 to 0.878)), but had more complex patterns of miscalibration and more variable regional and stage specific performance. Decision curve analysis suggested that the Cox and competing risks regression models tested may have higher clinical utility than the two machine learning approaches. CONCLUSION: In women with breast cancer of any stage, using the predictors available in this dataset, regression based methods had better and more consistent performance compared with machine learning approaches and may be worthy of further evaluation for potential clinical use, such as for stratified follow-up. BMJ Publishing Group Ltd. 2023-05-10 /pmc/articles/PMC10170264/ /pubmed/37164379 http://dx.doi.org/10.1136/bmj-2022-073800 Text en © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Research Clift, Ash Kieran Dodwell, David Lord, Simon Petrou, Stavros Brady, Michael Collins, Gary S Hippisley-Cox, Julia Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
title	Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
title_full	Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
title_fullStr	Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
title_full_unstemmed	Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
title_short	Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
title_sort	development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10170264/ https://www.ncbi.nlm.nih.gov/pubmed/37164379 http://dx.doi.org/10.1136/bmj-2022-073800
work_keys_str_mv	AT cliftashkieran developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy AT dodwelldavid developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy AT lordsimon developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy AT petroustavros developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy AT bradymichael developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy AT collinsgarys developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy AT hippisleycoxjulia developmentandinternalexternalvalidationofstatisticalandmachinelearningmodelsforbreastcancerprognosticationcohortstudy

Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study

Ejemplares similares