Cargando…

Logistic regression vs. predictive mean matching for imputing binary covariates

Multivariate imputation using chained equations (MICE) is a popular algorithm for imputing missing data that entails specifying multivariate models through conditional distributions. For imputing missing continuous variables, two common imputation methods are the use of parametric imputation using a...

Descripción completa

Detalles Bibliográficos
Autores principales: Austin, Peter C, van Buuren, Stef
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683343/
https://www.ncbi.nlm.nih.gov/pubmed/37750213
http://dx.doi.org/10.1177/09622802231198795
_version_ 1785151175160496128
author Austin, Peter C
van Buuren, Stef
author_facet Austin, Peter C
van Buuren, Stef
author_sort Austin, Peter C
collection PubMed
description Multivariate imputation using chained equations (MICE) is a popular algorithm for imputing missing data that entails specifying multivariate models through conditional distributions. For imputing missing continuous variables, two common imputation methods are the use of parametric imputation using a linear model and predictive mean matching. When imputing missing binary variables, the default approach is parametric imputation using a logistic regression model. In the R implementation of MICE, the use of predictive mean matching can be substantially faster than using logistic regression as the imputation model for missing binary variables. However, there is a paucity of research into the statistical performance of predictive mean matching for imputing missing binary variables. Our objective was to compare the statistical performance of predictive mean matching with that of logistic regression for imputing missing binary variables. Monte Carlo simulations were used to compare the statistical performance of predictive mean matching with that of logistic regression for imputing missing binary outcomes when the analysis model of scientific interest was a multivariable logistic regression model. We varied the size of the analysis samples (N = 250, 500, 1,000, 5,000, and 10,000) and the prevalence of missing data (5%–50% in increments of 5%). In general, the statistical performance of predictive mean matching was virtually identical to that of logistic regression for imputing missing binary variables when the analysis model was a logistic regression model. This was true across a wide range of scenarios defined by sample size and the prevalence of missing data. In conclusion, predictive mean matching can be used to impute missing binary variables. The use of predictive mean matching to impute missing binary variables can result in a substantial reduction in computer processing time when conducting simulations of multiple imputation.
format Online
Article
Text
id pubmed-10683343
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-106833432023-11-30 Logistic regression vs. predictive mean matching for imputing binary covariates Austin, Peter C van Buuren, Stef Stat Methods Med Res Original Research Articles Multivariate imputation using chained equations (MICE) is a popular algorithm for imputing missing data that entails specifying multivariate models through conditional distributions. For imputing missing continuous variables, two common imputation methods are the use of parametric imputation using a linear model and predictive mean matching. When imputing missing binary variables, the default approach is parametric imputation using a logistic regression model. In the R implementation of MICE, the use of predictive mean matching can be substantially faster than using logistic regression as the imputation model for missing binary variables. However, there is a paucity of research into the statistical performance of predictive mean matching for imputing missing binary variables. Our objective was to compare the statistical performance of predictive mean matching with that of logistic regression for imputing missing binary variables. Monte Carlo simulations were used to compare the statistical performance of predictive mean matching with that of logistic regression for imputing missing binary outcomes when the analysis model of scientific interest was a multivariable logistic regression model. We varied the size of the analysis samples (N = 250, 500, 1,000, 5,000, and 10,000) and the prevalence of missing data (5%–50% in increments of 5%). In general, the statistical performance of predictive mean matching was virtually identical to that of logistic regression for imputing missing binary variables when the analysis model was a logistic regression model. This was true across a wide range of scenarios defined by sample size and the prevalence of missing data. In conclusion, predictive mean matching can be used to impute missing binary variables. The use of predictive mean matching to impute missing binary variables can result in a substantial reduction in computer processing time when conducting simulations of multiple imputation. SAGE Publications 2023-09-26 2023-11 /pmc/articles/PMC10683343/ /pubmed/37750213 http://dx.doi.org/10.1177/09622802231198795 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research Articles
Austin, Peter C
van Buuren, Stef
Logistic regression vs. predictive mean matching for imputing binary covariates
title Logistic regression vs. predictive mean matching for imputing binary covariates
title_full Logistic regression vs. predictive mean matching for imputing binary covariates
title_fullStr Logistic regression vs. predictive mean matching for imputing binary covariates
title_full_unstemmed Logistic regression vs. predictive mean matching for imputing binary covariates
title_short Logistic regression vs. predictive mean matching for imputing binary covariates
title_sort logistic regression vs. predictive mean matching for imputing binary covariates
topic Original Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683343/
https://www.ncbi.nlm.nih.gov/pubmed/37750213
http://dx.doi.org/10.1177/09622802231198795
work_keys_str_mv AT austinpeterc logisticregressionvspredictivemeanmatchingforimputingbinarycovariates
AT vanbuurenstef logisticregressionvspredictivemeanmatchingforimputingbinarycovariates