Cargando…

Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study

BACKGROUND: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS: Observed data for 1000...

Descripción completa

Detalles Bibliográficos
Autores principales: Marshall, Andrea, Altman, Douglas G, Holder, Roger L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019210/
https://www.ncbi.nlm.nih.gov/pubmed/21194416
http://dx.doi.org/10.1186/1471-2288-10-112
_version_ 1782196187525283840
author Marshall, Andrea
Altman, Douglas G
Holder, Roger L
author_facet Marshall, Andrea
Altman, Douglas G
Holder, Roger L
author_sort Marshall, Andrea
collection PubMed
description BACKGROUND: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS: Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. RESULTS: CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. CONCLUSIONS: Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.
format Text
id pubmed-3019210
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30192102011-01-12 Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study Marshall, Andrea Altman, Douglas G Holder, Roger L BMC Med Res Methodol Research Article BACKGROUND: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS: Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. RESULTS: CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. CONCLUSIONS: Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness. BioMed Central 2010-12-31 /pmc/articles/PMC3019210/ /pubmed/21194416 http://dx.doi.org/10.1186/1471-2288-10-112 Text en Copyright ©2010 Marshall et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Marshall, Andrea
Altman, Douglas G
Holder, Roger L
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
title Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
title_full Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
title_fullStr Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
title_full_unstemmed Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
title_short Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
title_sort comparison of imputation methods for handling missing covariate data when fitting a cox proportional hazards model: a resampling study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019210/
https://www.ncbi.nlm.nih.gov/pubmed/21194416
http://dx.doi.org/10.1186/1471-2288-10-112
work_keys_str_mv AT marshallandrea comparisonofimputationmethodsforhandlingmissingcovariatedatawhenfittingacoxproportionalhazardsmodelaresamplingstudy
AT altmandouglasg comparisonofimputationmethodsforhandlingmissingcovariatedatawhenfittingacoxproportionalhazardsmodelaresamplingstudy
AT holderrogerl comparisonofimputationmethodsforhandlingmissingcovariatedatawhenfittingacoxproportionalhazardsmodelaresamplingstudy