Cargando…
大规模人群队列生活行为方式相关的肺癌风险预测模型的构建
OBJECTIVE: To identify the risk factors related to lifestyle behaviors that affect the incidence of lung cancer, to build a lung cancer risk prediction model to identify, in the population, individuals who are at high risk, and to facilitate the early detection of lung cancer. METHODS: The data used...
Formato: | Online Artículo Texto |
---|---|
Lenguaje: | English |
Publicado: |
四川大学学报(医学版)编辑部
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10579072/ https://www.ncbi.nlm.nih.gov/pubmed/37866943 http://dx.doi.org/10.12182/20230960209 |
_version_ | 1785121642652893184 |
---|---|
collection | PubMed |
description | OBJECTIVE: To identify the risk factors related to lifestyle behaviors that affect the incidence of lung cancer, to build a lung cancer risk prediction model to identify, in the population, individuals who are at high risk, and to facilitate the early detection of lung cancer. METHODS: The data used in the study were obtained from the UK Biobank, a database that contains information collected from 502389 participants between March 2006 and October 2010. Based on domestic and international guidelines for lung cancer screening and high-quality research literature on lung cancer risk factors, high-risk population identification criteria were determined. Univariate Cox regression was performed to screen for risk factors of lung cancer and a multifactor lung cancer risk prediction model was constructed using Cox proportional hazards regression. Based on the comparison of Akaike information criterion and Schoenfeld residual test results, the optimal fitted model assuming proportional hazards was selected. The multiple factor Cox proportional hazards regression was performed to consider the survival time and the population was randomly divided into a training set and a validation set by a ratio of 7:3. The model was built using the training set and the performance of the model was internally validated using the validation set. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the efficacy of the model. The population was categorized into low-risk, moderate-risk, and high-risk groups based on the probability of occurrence of 0% to <25%, 25% to <75%, and 75% to 100%. The respective proportions of affected individuals in each risk group were calculated. RESULTS: The study eventually covered 453558 individuals, and out of the cumulative follow-up of 5505402 person-years, a total of 2330 cases of lung cancer were diagnosed. Cox proportional hazards regression was performed to identify 10 independent variables as predictors of lung cancer, including age, body mass index (BMI), education, income, physical activity, smoking status, alcohol consumption frequency, fresh fruit intake, family history of cancer, and tobacco exposure, and a model was established accordingly. Internal validation results showed that 8 independent variables (all the 10 independent variables screened out except for BMI and fresh fruit intake) were significant influencing factors of lung cancer (P<0.05). The AUC of the training set for predicting lung cancer occurrence at one year, five years, and ten years were 0.825, 0.785, and 0.777, respectively. The AUC of the validation set for predicting lung cancer occurrence at one year, five years, and ten years were 0.857, 0.782, and 0.765, respectively. 68.38% of the individuals who might develop lung cancer in the future could be identified by screening the high-risk population. CONCLUSION: We established, in this study, a model for predicting lung cancer risks associated with lifestyle behaviors of a large population. Showing good performance in discriminatory ability, the model can be used as a tool for developing standardized screening strategies for lung cancer. |
format | Online Article Text |
id | pubmed-10579072 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | 四川大学学报(医学版)编辑部 |
record_format | MEDLINE/PubMed |
spelling | pubmed-105790722023-10-18 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 Sichuan Da Xue Xue Bao Yi Xue Ban 大数据与人工智能技术在生物医学多场景的应用 OBJECTIVE: To identify the risk factors related to lifestyle behaviors that affect the incidence of lung cancer, to build a lung cancer risk prediction model to identify, in the population, individuals who are at high risk, and to facilitate the early detection of lung cancer. METHODS: The data used in the study were obtained from the UK Biobank, a database that contains information collected from 502389 participants between March 2006 and October 2010. Based on domestic and international guidelines for lung cancer screening and high-quality research literature on lung cancer risk factors, high-risk population identification criteria were determined. Univariate Cox regression was performed to screen for risk factors of lung cancer and a multifactor lung cancer risk prediction model was constructed using Cox proportional hazards regression. Based on the comparison of Akaike information criterion and Schoenfeld residual test results, the optimal fitted model assuming proportional hazards was selected. The multiple factor Cox proportional hazards regression was performed to consider the survival time and the population was randomly divided into a training set and a validation set by a ratio of 7:3. The model was built using the training set and the performance of the model was internally validated using the validation set. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the efficacy of the model. The population was categorized into low-risk, moderate-risk, and high-risk groups based on the probability of occurrence of 0% to <25%, 25% to <75%, and 75% to 100%. The respective proportions of affected individuals in each risk group were calculated. RESULTS: The study eventually covered 453558 individuals, and out of the cumulative follow-up of 5505402 person-years, a total of 2330 cases of lung cancer were diagnosed. Cox proportional hazards regression was performed to identify 10 independent variables as predictors of lung cancer, including age, body mass index (BMI), education, income, physical activity, smoking status, alcohol consumption frequency, fresh fruit intake, family history of cancer, and tobacco exposure, and a model was established accordingly. Internal validation results showed that 8 independent variables (all the 10 independent variables screened out except for BMI and fresh fruit intake) were significant influencing factors of lung cancer (P<0.05). The AUC of the training set for predicting lung cancer occurrence at one year, five years, and ten years were 0.825, 0.785, and 0.777, respectively. The AUC of the validation set for predicting lung cancer occurrence at one year, five years, and ten years were 0.857, 0.782, and 0.765, respectively. 68.38% of the individuals who might develop lung cancer in the future could be identified by screening the high-risk population. CONCLUSION: We established, in this study, a model for predicting lung cancer risks associated with lifestyle behaviors of a large population. Showing good performance in discriminatory ability, the model can be used as a tool for developing standardized screening strategies for lung cancer. 四川大学学报(医学版)编辑部 2023-09-20 /pmc/articles/PMC10579072/ /pubmed/37866943 http://dx.doi.org/10.12182/20230960209 Text en © 2023《四川大学学报(医学版)》编辑部 版权所有 https://creativecommons.org/licenses/by-nc/4.0/开放获取 本文遵循知识共享署名—非商业性使用4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供指向本文许可协议的链接,同时标明是否对原文作了修改;不得将本文用于商业目的。CC BY-NC 4.0许可协议访问 https://creativecommons.org/licenses/by-nc/4.0 (https://creativecommons.org/licenses/by-nc/4.0/) https://creativecommons.org/licenses/by-nc/4.0/Open Access This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0). In other words, the full-text content of the journal is made freely available for third-party users to copy and redistribute in any medium or format, and to remix, transform, and build upon the content of the journal. You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may not use the content of the journal for commercial purposes. For more information about the license, visit https://creativecommons.org/licenses/by-nc/4.0 (https://creativecommons.org/licenses/by-nc/4.0/) |
spellingShingle | 大数据与人工智能技术在生物医学多场景的应用 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
title | 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
title_full | 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
title_fullStr | 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
title_full_unstemmed | 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
title_short | 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
title_sort | 大规模人群队列生活行为方式相关的肺癌风险预测模型的构建 |
topic | 大数据与人工智能技术在生物医学多场景的应用 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10579072/ https://www.ncbi.nlm.nih.gov/pubmed/37866943 http://dx.doi.org/10.12182/20230960209 |
work_keys_str_mv | AT dàguīmórénqúnduìlièshēnghuóxíngwèifāngshìxiāngguāndefèiáifēngxiǎnyùcèmóxíngdegòujiàn AT dàguīmórénqúnduìlièshēnghuóxíngwèifāngshìxiāngguāndefèiáifēngxiǎnyùcèmóxíngdegòujiàn AT dàguīmórénqúnduìlièshēnghuóxíngwèifāngshìxiāngguāndefèiáifēngxiǎnyùcèmóxíngdegòujiàn AT dàguīmórénqúnduìlièshēnghuóxíngwèifāngshìxiāngguāndefèiáifēngxiǎnyùcèmóxíngdegòujiàn AT dàguīmórénqúnduìlièshēnghuóxíngwèifāngshìxiāngguāndefèiáifēngxiǎnyùcèmóxíngdegòujiàn |