Cargando…

Recovery of information from multiple imputation: a simulation study

BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Katherine J, Carlin, John B
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Analytic Perspective
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3544721/ https://www.ncbi.nlm.nih.gov/pubmed/22695083 http://dx.doi.org/10.1186/1742-7622-9-3

_version_	1782255836906651648
author	Lee, Katherine J Carlin, John B
author_facet	Lee, Katherine J Carlin, John B
author_sort	Lee, Katherine J
collection	PubMed
description	BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.
format	Online Article Text
id	pubmed-3544721
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35447212013-01-15 Recovery of information from multiple imputation: a simulation study Lee, Katherine J Carlin, John B Emerg Themes Epidemiol Analytic Perspective BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method. BioMed Central 2012-06-13 /pmc/articles/PMC3544721/ /pubmed/22695083 http://dx.doi.org/10.1186/1742-7622-9-3 Text en Copyright ©2012 Lee and Carlin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Analytic Perspective Lee, Katherine J Carlin, John B Recovery of information from multiple imputation: a simulation study
title	Recovery of information from multiple imputation: a simulation study
title_full	Recovery of information from multiple imputation: a simulation study
title_fullStr	Recovery of information from multiple imputation: a simulation study
title_full_unstemmed	Recovery of information from multiple imputation: a simulation study
title_short	Recovery of information from multiple imputation: a simulation study
title_sort	recovery of information from multiple imputation: a simulation study
topic	Analytic Perspective
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3544721/ https://www.ncbi.nlm.nih.gov/pubmed/22695083 http://dx.doi.org/10.1186/1742-7622-9-3
work_keys_str_mv	AT leekatherinej recoveryofinformationfrommultipleimputationasimulationstudy AT carlinjohnb recoveryofinformationfrommultipleimputationasimulationstudy

Recovery of information from multiple imputation: a simulation study

Ejemplares similares