Cargando…

Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study

BACKGROUND: Risk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance...

Descripción completa

Detalles Bibliográficos
Autores principales:	Callender, Thomas, Imrie, Fergus, Cebere, Bogdan, Pashayan, Nora, Navani, Neal, van der Schaar, Mihaela, Janes, Sam M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547178/ https://www.ncbi.nlm.nih.gov/pubmed/37788223 http://dx.doi.org/10.1371/journal.pmed.1004287

_version_	1785115005344022528
author	Callender, Thomas Imrie, Fergus Cebere, Bogdan Pashayan, Nora Navani, Neal van der Schaar, Mihaela Janes, Sam M.
author_facet	Callender, Thomas Imrie, Fergus Cebere, Bogdan Pashayan, Nora Navani, Neal van der Schaar, Mihaela Janes, Sam M.
author_sort	Callender, Thomas
collection	PubMed
description	BACKGROUND: Risk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models while maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening. METHODS AND FINDINGS: For model development, we used data from 216,714 ever-smokers recruited between 2006 and 2010 to the UK Biobank prospective cohort and 26,616 high-risk ever-smokers recruited between 2002 and 2004 to the control arm of the US National Lung Screening (NLST) randomised controlled trial. The NLST trial randomised high-risk smokers from 33 US centres with at least a 30 pack-year smoking history and fewer than 15 quit-years to annual CT or chest radiography screening for lung cancer. We externally validated our models among 49,593 participants in the chest radiography arm and all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian (PLCO) Screening Trial. The PLCO trial, recruiting from 1993 to 2001, analysed the impact of chest radiography or no chest radiography for lung cancer screening. We primarily validated in the PLCO chest radiography arm such that we could benchmark against comparator models developed within the PLCO control arm. Models were developed to predict the risk of 2 outcomes within 5 years from baseline: diagnosis of lung cancer and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis. Models predicting lung cancer death (UCL-D) and incidence (UCL-I) using 3 variables—age, smoking duration, and pack-years—achieved or exceeded parity in discrimination, overall performance, and net benefit with comparators currently in use, despite requiring only one-quarter of the predictors. In external validation in the PLCO trial, UCL-D had an AUC of 0.803 (95% CI: 0.783, 0.824) and was well calibrated with an expected/observed (E/O) ratio of 1.05 (95% CI: 0.95, 1.19). UCL-I had an AUC of 0.787 (95% CI: 0.771, 0.802), an E/O ratio of 1.0 (95% CI: 0.92, 1.07). The sensitivity of UCL-D was 85.5% and UCL-I was 83.9%, at 5-year risk thresholds of 0.68% and 1.17%, respectively, 7.9% and 6.2% higher than the USPSTF-2021 criteria at the same specificity. The main limitation of this study is that the models have not been validated outside of UK and US cohorts. CONCLUSIONS: We present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings.
format	Online Article Text
id	pubmed-10547178
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-105471782023-10-04 Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study Callender, Thomas Imrie, Fergus Cebere, Bogdan Pashayan, Nora Navani, Neal van der Schaar, Mihaela Janes, Sam M. PLoS Med Research Article BACKGROUND: Risk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models while maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening. METHODS AND FINDINGS: For model development, we used data from 216,714 ever-smokers recruited between 2006 and 2010 to the UK Biobank prospective cohort and 26,616 high-risk ever-smokers recruited between 2002 and 2004 to the control arm of the US National Lung Screening (NLST) randomised controlled trial. The NLST trial randomised high-risk smokers from 33 US centres with at least a 30 pack-year smoking history and fewer than 15 quit-years to annual CT or chest radiography screening for lung cancer. We externally validated our models among 49,593 participants in the chest radiography arm and all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian (PLCO) Screening Trial. The PLCO trial, recruiting from 1993 to 2001, analysed the impact of chest radiography or no chest radiography for lung cancer screening. We primarily validated in the PLCO chest radiography arm such that we could benchmark against comparator models developed within the PLCO control arm. Models were developed to predict the risk of 2 outcomes within 5 years from baseline: diagnosis of lung cancer and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis. Models predicting lung cancer death (UCL-D) and incidence (UCL-I) using 3 variables—age, smoking duration, and pack-years—achieved or exceeded parity in discrimination, overall performance, and net benefit with comparators currently in use, despite requiring only one-quarter of the predictors. In external validation in the PLCO trial, UCL-D had an AUC of 0.803 (95% CI: 0.783, 0.824) and was well calibrated with an expected/observed (E/O) ratio of 1.05 (95% CI: 0.95, 1.19). UCL-I had an AUC of 0.787 (95% CI: 0.771, 0.802), an E/O ratio of 1.0 (95% CI: 0.92, 1.07). The sensitivity of UCL-D was 85.5% and UCL-I was 83.9%, at 5-year risk thresholds of 0.68% and 1.17%, respectively, 7.9% and 6.2% higher than the USPSTF-2021 criteria at the same specificity. The main limitation of this study is that the models have not been validated outside of UK and US cohorts. CONCLUSIONS: We present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings. Public Library of Science 2023-10-03 /pmc/articles/PMC10547178/ /pubmed/37788223 http://dx.doi.org/10.1371/journal.pmed.1004287 Text en © 2023 Callender et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Callender, Thomas Imrie, Fergus Cebere, Bogdan Pashayan, Nora Navani, Neal van der Schaar, Mihaela Janes, Sam M. Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study
title	Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study
title_full	Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study
title_fullStr	Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study
title_full_unstemmed	Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study
title_short	Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study
title_sort	assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: a development and validation study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547178/ https://www.ncbi.nlm.nih.gov/pubmed/37788223 http://dx.doi.org/10.1371/journal.pmed.1004287
work_keys_str_mv	AT callenderthomas assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy AT imriefergus assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy AT ceberebogdan assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy AT pashayannora assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy AT navanineal assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy AT vanderschaarmihaela assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy AT janessamm assessingeligibilityforlungcancerscreeningusingparsimoniousensemblemachinelearningmodelsadevelopmentandvalidationstudy

Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study

Ejemplares similares