Cargando…

Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

BACKGROUND: Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. METHODS: A simulation study of a linear regression with a response Y and two predictors X(1) and X(...

Descripción completa

Detalles Bibliográficos
Autores principales: Hardt, Jochen, Herke, Max, Leonhart, Rainer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538666/
https://www.ncbi.nlm.nih.gov/pubmed/23216665
http://dx.doi.org/10.1186/1471-2288-12-184
_version_ 1782254988640124928
author Hardt, Jochen
Herke, Max
Leonhart, Rainer
author_facet Hardt, Jochen
Herke, Max
Leonhart, Rainer
author_sort Hardt, Jochen
collection PubMed
description BACKGROUND: Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. METHODS: A simulation study of a linear regression with a response Y and two predictors X(1) and X(2) was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y. RESULTS: The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. CONCLUSION: More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.
format Online
Article
Text
id pubmed-3538666
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35386662013-01-10 Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research Hardt, Jochen Herke, Max Leonhart, Rainer BMC Med Res Methodol Research Article BACKGROUND: Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. METHODS: A simulation study of a linear regression with a response Y and two predictors X(1) and X(2) was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y. RESULTS: The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. CONCLUSION: More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3. BioMed Central 2012-12-05 /pmc/articles/PMC3538666/ /pubmed/23216665 http://dx.doi.org/10.1186/1471-2288-12-184 Text en Copyright ©2012 Hardt et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hardt, Jochen
Herke, Max
Leonhart, Rainer
Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_full Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_fullStr Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_full_unstemmed Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_short Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
title_sort auxiliary variables in multiple imputation in regression with missing x: a warning against including too many in small sample research
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538666/
https://www.ncbi.nlm.nih.gov/pubmed/23216665
http://dx.doi.org/10.1186/1471-2288-12-184
work_keys_str_mv AT hardtjochen auxiliaryvariablesinmultipleimputationinregressionwithmissingxawarningagainstincludingtoomanyinsmallsampleresearch
AT herkemax auxiliaryvariablesinmultipleimputationinregressionwithmissingxawarningagainstincludingtoomanyinsmallsampleresearch
AT leonhartrainer auxiliaryvariablesinmultipleimputationinregressionwithmissingxawarningagainstincludingtoomanyinsmallsampleresearch