Cargando…

Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting

BACKGROUND: Detecting early‐stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis o...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yutao, Zou, Zixiu, Gao, Zhunyi, Wang, Yi, Xiao, Man, Xu, Chang, Jiang, Gengxi, Wang, Haijian, Jin, Li, Wang, Jiucun, Wang, Huai Zhou, Guo, Shicheng, Wu, Junjie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9741969/
https://www.ncbi.nlm.nih.gov/pubmed/35499292
http://dx.doi.org/10.1002/cam4.4800
_version_ 1784848429489324032
author Li, Yutao
Zou, Zixiu
Gao, Zhunyi
Wang, Yi
Xiao, Man
Xu, Chang
Jiang, Gengxi
Wang, Haijian
Jin, Li
Wang, Jiucun
Wang, Huai Zhou
Guo, Shicheng
Wu, Junjie
author_facet Li, Yutao
Zou, Zixiu
Gao, Zhunyi
Wang, Yi
Xiao, Man
Xu, Chang
Jiang, Gengxi
Wang, Haijian
Jin, Li
Wang, Jiucun
Wang, Huai Zhou
Guo, Shicheng
Wu, Junjie
author_sort Li, Yutao
collection PubMed
description BACKGROUND: Detecting early‐stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case–control study. MATERIALS AND METHODS: A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large‐scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10‐fold cross‐validation was performed, and model predictive performance was evaluated by the area under the curve (AUC). RESULTS: After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816). CONCLUSION: Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic‐based logistic regression risk prediction model significantly improves model performance.
format Online
Article
Text
id pubmed-9741969
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-97419692022-12-13 Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting Li, Yutao Zou, Zixiu Gao, Zhunyi Wang, Yi Xiao, Man Xu, Chang Jiang, Gengxi Wang, Haijian Jin, Li Wang, Jiucun Wang, Huai Zhou Guo, Shicheng Wu, Junjie Cancer Med RESEARCH ARTICLES BACKGROUND: Detecting early‐stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case–control study. MATERIALS AND METHODS: A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large‐scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10‐fold cross‐validation was performed, and model predictive performance was evaluated by the area under the curve (AUC). RESULTS: After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816). CONCLUSION: Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic‐based logistic regression risk prediction model significantly improves model performance. John Wiley and Sons Inc. 2022-05-02 /pmc/articles/PMC9741969/ /pubmed/35499292 http://dx.doi.org/10.1002/cam4.4800 Text en © 2022 The Authors. Cancer Medicine published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle RESEARCH ARTICLES
Li, Yutao
Zou, Zixiu
Gao, Zhunyi
Wang, Yi
Xiao, Man
Xu, Chang
Jiang, Gengxi
Wang, Haijian
Jin, Li
Wang, Jiucun
Wang, Huai Zhou
Guo, Shicheng
Wu, Junjie
Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting
title Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting
title_full Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting
title_fullStr Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting
title_full_unstemmed Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting
title_short Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting
title_sort prediction of lung cancer risk in chinese population with genetic‐environment factor using extreme gradient boosting
topic RESEARCH ARTICLES
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9741969/
https://www.ncbi.nlm.nih.gov/pubmed/35499292
http://dx.doi.org/10.1002/cam4.4800
work_keys_str_mv AT liyutao predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT zouzixiu predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT gaozhunyi predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT wangyi predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT xiaoman predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT xuchang predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT jianggengxi predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT wanghaijian predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT jinli predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT wangjiucun predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT wanghuaizhou predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT guoshicheng predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting
AT wujunjie predictionoflungcancerriskinchinesepopulationwithgeneticenvironmentfactorusingextremegradientboosting