Cargando…

An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk

INTRODUCTION: Biases in cancer incidence characteristics have led to significant imbalances in databases constructed by prospective cohort studies. Since they use imbalanced databases, many traditional algorithms for training cancer risk prediction models perform poorly. METHODS: To improve predicti...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Liyuan, Yang, Fu, Fan, Yeye, Kao, Chunyu, Wang, Fei, Yu, Lixiang, He, Yong, Ji, Jiadong, Yu, Zhigang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Editorial Office of CCDCW, Chinese Center for Disease Control and Prevention 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061827/
https://www.ncbi.nlm.nih.gov/pubmed/37007865
http://dx.doi.org/10.46234/ccdcw2023.037
_version_ 1785017371593801728
author Liu, Liyuan
Yang, Fu
Fan, Yeye
Kao, Chunyu
Wang, Fei
Yu, Lixiang
He, Yong
Ji, Jiadong
Yu, Zhigang
author_facet Liu, Liyuan
Yang, Fu
Fan, Yeye
Kao, Chunyu
Wang, Fei
Yu, Lixiang
He, Yong
Ji, Jiadong
Yu, Zhigang
author_sort Liu, Liyuan
collection PubMed
description INTRODUCTION: Biases in cancer incidence characteristics have led to significant imbalances in databases constructed by prospective cohort studies. Since they use imbalanced databases, many traditional algorithms for training cancer risk prediction models perform poorly. METHODS: To improve prediction performance, we introduced a Bagging ensemble framework to an absolute risk model based on ensemble penalized Cox regression (EPCR). We then tested whether the EPCR model outperformed other traditional regression models by varying the censoring rate of the simulated data. RESULTS: Six different simulation studies were performed with 100 replicates. To assess model performance, we calculated mean false discovery rate, false omission rate, true positive rate, true negative rate, and the areas under the receiver operating characteristic curve (AUC) values. We found that the EPCR procedure could reduce the false discovery rate (FDR) for important variables at the same true positive rate (TPR), thereby achieving more accurate variable screening. In addition, we used the EPCR procedure to build a breast cancer risk prediction model based on the Breast Cancer Cohort Study in Chinese Women database. AUCs for 3- and 5-year predictions were 0.691 and 0.642, representing improvements of 0.189 and 0.117 over the classical Gail model, respectively. DISCUSSION: We conclude that the EPCR procedure can overcome challenges posed by imbalanced data and improve the performance of cancer risk assessment tools.
format Online
Article
Text
id pubmed-10061827
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Editorial Office of CCDCW, Chinese Center for Disease Control and Prevention
record_format MEDLINE/PubMed
spelling pubmed-100618272023-03-31 An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk Liu, Liyuan Yang, Fu Fan, Yeye Kao, Chunyu Wang, Fei Yu, Lixiang He, Yong Ji, Jiadong Yu, Zhigang China CDC Wkly Methods and Applications INTRODUCTION: Biases in cancer incidence characteristics have led to significant imbalances in databases constructed by prospective cohort studies. Since they use imbalanced databases, many traditional algorithms for training cancer risk prediction models perform poorly. METHODS: To improve prediction performance, we introduced a Bagging ensemble framework to an absolute risk model based on ensemble penalized Cox regression (EPCR). We then tested whether the EPCR model outperformed other traditional regression models by varying the censoring rate of the simulated data. RESULTS: Six different simulation studies were performed with 100 replicates. To assess model performance, we calculated mean false discovery rate, false omission rate, true positive rate, true negative rate, and the areas under the receiver operating characteristic curve (AUC) values. We found that the EPCR procedure could reduce the false discovery rate (FDR) for important variables at the same true positive rate (TPR), thereby achieving more accurate variable screening. In addition, we used the EPCR procedure to build a breast cancer risk prediction model based on the Breast Cancer Cohort Study in Chinese Women database. AUCs for 3- and 5-year predictions were 0.691 and 0.642, representing improvements of 0.189 and 0.117 over the classical Gail model, respectively. DISCUSSION: We conclude that the EPCR procedure can overcome challenges posed by imbalanced data and improve the performance of cancer risk assessment tools. Editorial Office of CCDCW, Chinese Center for Disease Control and Prevention 2023-03-03 /pmc/articles/PMC10061827/ /pubmed/37007865 http://dx.doi.org/10.46234/ccdcw2023.037 Text en Copyright and License information: Editorial Office of CCDCW, Chinese Center for Disease Control and Prevention 2023 https://creativecommons.org/licenses/by-nc-sa/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ (https://creativecommons.org/licenses/by-nc-sa/4.0/)
spellingShingle Methods and Applications
Liu, Liyuan
Yang, Fu
Fan, Yeye
Kao, Chunyu
Wang, Fei
Yu, Lixiang
He, Yong
Ji, Jiadong
Yu, Zhigang
An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk
title An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk
title_full An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk
title_fullStr An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk
title_full_unstemmed An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk
title_short An Improved Training Algorithm Based on Ensemble Penalized Cox Regression for Predicting Absolute Cancer Risk
title_sort improved training algorithm based on ensemble penalized cox regression for predicting absolute cancer risk
topic Methods and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061827/
https://www.ncbi.nlm.nih.gov/pubmed/37007865
http://dx.doi.org/10.46234/ccdcw2023.037
work_keys_str_mv AT liuliyuan animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT yangfu animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT fanyeye animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT kaochunyu animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT wangfei animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT yulixiang animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT heyong animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT jijiadong animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT yuzhigang animprovedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT liuliyuan improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT yangfu improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT fanyeye improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT kaochunyu improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT wangfei improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT yulixiang improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT heyong improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT jijiadong improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk
AT yuzhigang improvedtrainingalgorithmbasedonensemblepenalizedcoxregressionforpredictingabsolutecancerrisk