Cargando…

Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care

BACKGROUND: This study used machine learning to develop a 3-year lung cancer risk prediction model with large real-world data in a mostly younger population. METHODS: Over 4.7 million individuals, aged 45 to 65 years with no history of any cancer or lung cancer screening, diagnostic, or treatment pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Chandran, Urmila, Reps, Jenna, Yang, Robert, Vachani, Anil, Maldonado, Fabien, Kalsekar, Iftekhar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for Cancer Research 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9986687/
https://www.ncbi.nlm.nih.gov/pubmed/36576991
http://dx.doi.org/10.1158/1055-9965.EPI-22-0873
_version_ 1784901223943503872
author Chandran, Urmila
Reps, Jenna
Yang, Robert
Vachani, Anil
Maldonado, Fabien
Kalsekar, Iftekhar
author_facet Chandran, Urmila
Reps, Jenna
Yang, Robert
Vachani, Anil
Maldonado, Fabien
Kalsekar, Iftekhar
author_sort Chandran, Urmila
collection PubMed
description BACKGROUND: This study used machine learning to develop a 3-year lung cancer risk prediction model with large real-world data in a mostly younger population. METHODS: Over 4.7 million individuals, aged 45 to 65 years with no history of any cancer or lung cancer screening, diagnostic, or treatment procedures, with an outpatient visit in 2013 were identified in Optum's de-identified Electronic Health Record (EHR) dataset. A least absolute shrinkage and selection operator model was fit using all available data in the 365 days prior. Temporal validation was assessed with recent data. External validation was assessed with data from Mercy Health Systems EHR and Optum's de-identified Clinformatics Data Mart Database. Racial inequities in model discrimination were assessed with xAUCs. RESULTS: The model AUC was 0.76. Top predictors included age, smoking, race, ethnicity, and diagnosis of chronic obstructive pulmonary disease. The model identified a high-risk group with lung cancer incidence 9 times the average cohort incidence, representing 10% of patients with lung cancer. Model performed well temporally and externally, while performance was reduced for Asians and Hispanics. CONCLUSIONS: A high-dimensional model trained using big data identified a subset of patients with high lung cancer risk. The model demonstrated transportability to EHR and claims data, while underscoring the need to assess racial disparities when using machine learning methods. IMPACT: This internally and externally validated real-world data-based lung cancer prediction model is available on an open-source platform for broad sharing and application. Model integration into an EHR system could minimize physician burden by automating identification of high-risk patients.
format Online
Article
Text
id pubmed-9986687
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Association for Cancer Research
record_format MEDLINE/PubMed
spelling pubmed-99866872023-03-07 Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care Chandran, Urmila Reps, Jenna Yang, Robert Vachani, Anil Maldonado, Fabien Kalsekar, Iftekhar Cancer Epidemiol Biomarkers Prev Research Articles BACKGROUND: This study used machine learning to develop a 3-year lung cancer risk prediction model with large real-world data in a mostly younger population. METHODS: Over 4.7 million individuals, aged 45 to 65 years with no history of any cancer or lung cancer screening, diagnostic, or treatment procedures, with an outpatient visit in 2013 were identified in Optum's de-identified Electronic Health Record (EHR) dataset. A least absolute shrinkage and selection operator model was fit using all available data in the 365 days prior. Temporal validation was assessed with recent data. External validation was assessed with data from Mercy Health Systems EHR and Optum's de-identified Clinformatics Data Mart Database. Racial inequities in model discrimination were assessed with xAUCs. RESULTS: The model AUC was 0.76. Top predictors included age, smoking, race, ethnicity, and diagnosis of chronic obstructive pulmonary disease. The model identified a high-risk group with lung cancer incidence 9 times the average cohort incidence, representing 10% of patients with lung cancer. Model performed well temporally and externally, while performance was reduced for Asians and Hispanics. CONCLUSIONS: A high-dimensional model trained using big data identified a subset of patients with high lung cancer risk. The model demonstrated transportability to EHR and claims data, while underscoring the need to assess racial disparities when using machine learning methods. IMPACT: This internally and externally validated real-world data-based lung cancer prediction model is available on an open-source platform for broad sharing and application. Model integration into an EHR system could minimize physician burden by automating identification of high-risk patients. American Association for Cancer Research 2023-03-06 2022-12-28 /pmc/articles/PMC9986687/ /pubmed/36576991 http://dx.doi.org/10.1158/1055-9965.EPI-22-0873 Text en ©2022 The Authors; Published by the American Association for Cancer Research https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.
spellingShingle Research Articles
Chandran, Urmila
Reps, Jenna
Yang, Robert
Vachani, Anil
Maldonado, Fabien
Kalsekar, Iftekhar
Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
title Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
title_full Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
title_fullStr Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
title_full_unstemmed Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
title_short Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
title_sort machine learning and real-world data to predict lung cancer risk in routine care
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9986687/
https://www.ncbi.nlm.nih.gov/pubmed/36576991
http://dx.doi.org/10.1158/1055-9965.EPI-22-0873
work_keys_str_mv AT chandranurmila machinelearningandrealworlddatatopredictlungcancerriskinroutinecare
AT repsjenna machinelearningandrealworlddatatopredictlungcancerriskinroutinecare
AT yangrobert machinelearningandrealworlddatatopredictlungcancerriskinroutinecare
AT vachanianil machinelearningandrealworlddatatopredictlungcancerriskinroutinecare
AT maldonadofabien machinelearningandrealworlddatatopredictlungcancerriskinroutinecare
AT kalsekariftekhar machinelearningandrealworlddatatopredictlungcancerriskinroutinecare