Cargando…

Imputing missing covariate values for the Cox model

Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantl...

Descripción completa

Detalles Bibliográficos
Autores principales: White, Ian R, Royston, Patrick
Formato: Texto
Lenguaje:English
Publicado: John Wiley & Sons, Ltd. 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998703/
https://www.ncbi.nlm.nih.gov/pubmed/19452569
http://dx.doi.org/10.1002/sim.3618
_version_ 1782193391547711488
author White, Ian R
Royston, Patrick
author_facet White, Ian R
Royston, Patrick
author_sort White, Ian R
collection PubMed
description Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear. We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H(0)(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H(0)(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression. We compare the methods using simulation studies. We find that using log T biases covariate-outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd.
format Text
id pubmed-2998703
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher John Wiley & Sons, Ltd.
record_format MEDLINE/PubMed
spelling pubmed-29987032010-12-31 Imputing missing covariate values for the Cox model White, Ian R Royston, Patrick Stat Med Research Article Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear. We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H(0)(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H(0)(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression. We compare the methods using simulation studies. We find that using log T biases covariate-outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd. John Wiley & Sons, Ltd. 2009-07-10 2009-05-19 /pmc/articles/PMC2998703/ /pubmed/19452569 http://dx.doi.org/10.1002/sim.3618 Text en Copyright © 2009 John Wiley & Sons, Ltd. http://creativecommons.org/licenses/by/2.5/ Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.
spellingShingle Research Article
White, Ian R
Royston, Patrick
Imputing missing covariate values for the Cox model
title Imputing missing covariate values for the Cox model
title_full Imputing missing covariate values for the Cox model
title_fullStr Imputing missing covariate values for the Cox model
title_full_unstemmed Imputing missing covariate values for the Cox model
title_short Imputing missing covariate values for the Cox model
title_sort imputing missing covariate values for the cox model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998703/
https://www.ncbi.nlm.nih.gov/pubmed/19452569
http://dx.doi.org/10.1002/sim.3618
work_keys_str_mv AT whiteianr imputingmissingcovariatevaluesforthecoxmodel
AT roystonpatrick imputingmissingcovariatevaluesforthecoxmodel