Cargando…

Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts

BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in tra...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Zihang, Gao, Yanping, Li, Jiali, Zhang, Gong, Sun, Shaoxing, Wu, Qiuji, Gong, Yan, Xie, Conghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762368/
https://www.ncbi.nlm.nih.gov/pubmed/35070171
http://dx.doi.org/10.1016/j.csbj.2022.01.004
_version_ 1784633748891893760
author Zeng, Zihang
Gao, Yanping
Li, Jiali
Zhang, Gong
Sun, Shaoxing
Wu, Qiuji
Gong, Yan
Xie, Conghua
author_facet Zeng, Zihang
Gao, Yanping
Li, Jiali
Zhang, Gong
Sun, Shaoxing
Wu, Qiuji
Gong, Yan
Xie, Conghua
author_sort Zeng, Zihang
collection PubMed
description BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in transcriptomic data has lacked. RESULTS: The whole transcriptomic data of the 9,056 patients from 32 cohorts of The Cancer Genome Atlas and the 3 lung cancer cohorts from Gene Expression Omnibus were collected to construct CPH model for each gene separately for fitting the overall survival. An average of 8.5% gene CPH models violated the PH assumption in TCGA pan-cancer cohorts. In the gene interaction networks, both hub and non-hub genes in CPH models were likely to have non-proportional hazards. Violations of PH assumption for the same gene models were not consistent in 5 non-small cell lung cancer datasets (all kappa coefficients < 0.2), indicating that the non-proportionality of gene CPH models depended on the datasets. Furthermore, the introduction of log(t) or sqrt(t) time-functions into CPH improved the performance of gene models on overall survival fitting in most tumors. The time-dependent CPH changed the significance of log hazard ratio of the 31.9% gene variables. CONCLUSIONS: Our analysis resulted that non-proportional hazards should not be ignored in transcriptomic data. Introducing time interaction term ameliorated performance and interpretability of non-proportional hazards of transcriptome data in CPH.
format Online
Article
Text
id pubmed-8762368
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-87623682022-01-21 Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts Zeng, Zihang Gao, Yanping Li, Jiali Zhang, Gong Sun, Shaoxing Wu, Qiuji Gong, Yan Xie, Conghua Comput Struct Biotechnol J Research Article BACKGROUND: Cox proportional hazard regression (CPH) model relies on the proportional hazard (PH) assumption: the hazard of variables is independent of time. CPH has been widely used to identify prognostic markers of the transcriptome. However, the comprehensive investigation on PH assumption in transcriptomic data has lacked. RESULTS: The whole transcriptomic data of the 9,056 patients from 32 cohorts of The Cancer Genome Atlas and the 3 lung cancer cohorts from Gene Expression Omnibus were collected to construct CPH model for each gene separately for fitting the overall survival. An average of 8.5% gene CPH models violated the PH assumption in TCGA pan-cancer cohorts. In the gene interaction networks, both hub and non-hub genes in CPH models were likely to have non-proportional hazards. Violations of PH assumption for the same gene models were not consistent in 5 non-small cell lung cancer datasets (all kappa coefficients < 0.2), indicating that the non-proportionality of gene CPH models depended on the datasets. Furthermore, the introduction of log(t) or sqrt(t) time-functions into CPH improved the performance of gene models on overall survival fitting in most tumors. The time-dependent CPH changed the significance of log hazard ratio of the 31.9% gene variables. CONCLUSIONS: Our analysis resulted that non-proportional hazards should not be ignored in transcriptomic data. Introducing time interaction term ameliorated performance and interpretability of non-proportional hazards of transcriptome data in CPH. Research Network of Computational and Structural Biotechnology 2022-01-07 /pmc/articles/PMC8762368/ /pubmed/35070171 http://dx.doi.org/10.1016/j.csbj.2022.01.004 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Zeng, Zihang
Gao, Yanping
Li, Jiali
Zhang, Gong
Sun, Shaoxing
Wu, Qiuji
Gong, Yan
Xie, Conghua
Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
title Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
title_full Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
title_fullStr Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
title_full_unstemmed Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
title_short Violations of proportional hazard assumption in Cox regression model of transcriptomic data in TCGA pan-cancer cohorts
title_sort violations of proportional hazard assumption in cox regression model of transcriptomic data in tcga pan-cancer cohorts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762368/
https://www.ncbi.nlm.nih.gov/pubmed/35070171
http://dx.doi.org/10.1016/j.csbj.2022.01.004
work_keys_str_mv AT zengzihang violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT gaoyanping violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT lijiali violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT zhanggong violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT sunshaoxing violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT wuqiuji violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT gongyan violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts
AT xieconghua violationsofproportionalhazardassumptionincoxregressionmodeloftranscriptomicdataintcgapancancercohorts