Cargando…
Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in tra...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762368/ https://www.ncbi.nlm.nih.gov/pubmed/35070171 http://dx.doi.org/10.1016/j.csbj.2022.01.004 |
_version_ | 1784633748891893760 |
---|---|
author | Zeng, Zihang Gao, Yanping Li, Jiali Zhang, Gong Sun, Shaoxing Wu, Qiuji Gong, Yan Xie, Conghua |
author_facet | Zeng, Zihang Gao, Yanping Li, Jiali Zhang, Gong Sun, Shaoxing Wu, Qiuji Gong, Yan Xie, Conghua |
author_sort | Zeng, Zihang |
collection | PubMed |
description | BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in transcriptomic data has lacked. RESULTS: The whole transcriptomic data of the 9,056 patients from 32 cohorts of The Cancer Genome Atlas and the 3 lung cancer cohorts from Gene Expression Omnibus were collected to construct CPH model for each gene separately for fitting the overall survival. An average of 8.5% gene CPH models violated the PH assumption in TCGA pan-cancer cohorts. In the gene interaction networks, both hub and non-hub genes in CPH models were likely to have non-proportional hazards. Violations of PH assumption for the same gene models were not consistent in 5 non-small cell lung cancer datasets (all kappa coefficients < 0.2), indicating that the non-proportionality of gene CPH models depended on the datasets. Furthermore, the introduction of log(t) or sqrt(t) time-functions into CPH improved the performance of gene models on overall survival fitting in most tumors. The time-dependent CPH changed the significance of log hazard ratio of the 31.9% gene variables. CONCLUSIONS: Our analysis resulted that non-proportional hazards should not be ignored in transcriptomic data. Introducing time interaction term ameliorated performance and interpretability of non-proportional hazards of transcriptome data in CPH. |
format | Online Article Text |
id | pubmed-8762368 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-87623682022-01-21 Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts Zeng, Zihang Gao, Yanping Li, Jiali Zhang, Gong Sun, Shaoxing Wu, Qiuji Gong, Yan Xie, Conghua Comput Struct Biotechnol J Research Article BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in transcriptomic data has lacked. RESULTS: The whole transcriptomic data of the 9,056 patients from 32 cohorts of The Cancer Genome Atlas and the 3 lung cancer cohorts from Gene Expression Omnibus were collected to construct CPH model for each gene separately for fitting the overall survival. An average of 8.5% gene CPH models violated the PH assumption in TCGA pan-cancer cohorts. In the gene interaction networks, both hub and non-hub genes in CPH models were likely to have non-proportional hazards. Violations of PH assumption for the same gene models were not consistent in 5 non-small cell lung cancer datasets (all kappa coefficients < 0.2), indicating that the non-proportionality of gene CPH models depended on the datasets. Furthermore, the introduction of log(t) or sqrt(t) time-functions into CPH improved the performance of gene models on overall survival fitting in most tumors. The time-dependent CPH changed the significance of log hazard ratio of the 31.9% gene variables. CONCLUSIONS: Our analysis resulted that non-proportional hazards should not be ignored in transcriptomic data. Introducing time interaction term ameliorated performance and interpretability of non-proportional hazards of transcriptome data in CPH. Research Network of Computational and Structural Biotechnology 2022-01-07 /pmc/articles/PMC8762368/ /pubmed/35070171 http://dx.doi.org/10.1016/j.csbj.2022.01.004 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Zeng, Zihang Gao, Yanping Li, Jiali Zhang, Gong Sun, Shaoxing Wu, Qiuji Gong, Yan Xie, Conghua Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts |
title | Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts |
title_full | Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts |
title_fullStr | Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts |
title_full_unstemmed | Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts |
title_short | Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts |
title_sort | violations of proportional hazard assumption in cox regression model of transcriptomic data in tcga pan-cancer cohorts |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762368/ https://www.ncbi.nlm.nih.gov/pubmed/35070171 http://dx.doi.org/10.1016/j.csbj.2022.01.004 |
work_keys_str_mv | AT zengzihang violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT gaoyanping violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT lijiali violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT zhanggong violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT sunshaoxing violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT wuqiuji violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT gongyan violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts AT xieconghua violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts |