Cargando…
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
BACKGROUND: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS: Observed data for 1000...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019210/ https://www.ncbi.nlm.nih.gov/pubmed/21194416 http://dx.doi.org/10.1186/1471-2288-10-112 |
_version_ | 1782196187525283840 |
---|---|
author | Marshall, Andrea Altman, Douglas G Holder, Roger L |
author_facet | Marshall, Andrea Altman, Douglas G Holder, Roger L |
author_sort | Marshall, Andrea |
collection | PubMed |
description | BACKGROUND: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS: Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. RESULTS: CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. CONCLUSIONS: Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness. |
format | Text |
id | pubmed-3019210 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30192102011-01-12 Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study Marshall, Andrea Altman, Douglas G Holder, Roger L BMC Med Res Methodol Research Article BACKGROUND: The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. METHODS: Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. RESULTS: CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. CONCLUSIONS: Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness. BioMed Central 2010-12-31 /pmc/articles/PMC3019210/ /pubmed/21194416 http://dx.doi.org/10.1186/1471-2288-10-112 Text en Copyright ©2010 Marshall et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Marshall, Andrea Altman, Douglas G Holder, Roger L Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study |
title | Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study |
title_full | Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study |
title_fullStr | Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study |
title_full_unstemmed | Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study |
title_short | Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study |
title_sort | comparison of imputation methods for handling missing covariate data when fitting a cox proportional hazards model: a resampling study |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019210/ https://www.ncbi.nlm.nih.gov/pubmed/21194416 http://dx.doi.org/10.1186/1471-2288-10-112 |
work_keys_str_mv | AT marshallandrea comparisonofimputationmethodsforhandlingmissingcovariatedatawhenfittingacoxproportionalhazardsmodelaresamplingstudy AT altmandouglasg comparisonofimputationmethodsforhandlingmissingcovariatedatawhenfittingacoxproportionalhazardsmodelaresamplingstudy AT holderrogerl comparisonofimputationmethodsforhandlingmissingcovariatedatawhenfittingacoxproportionalhazardsmodelaresamplingstudy |