Cargando…

Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health

Sample estimates derived from data with missing values may be unreliable and may negatively impact the inferences that researchers make about the underlying population due to nonresponse bias. As a result, imputation is often preferred to listwise deletion in handling multivariate missing data. In t...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Steven, Chen, Sixia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9864541/
https://www.ncbi.nlm.nih.gov/pubmed/36674279
http://dx.doi.org/10.3390/ijerph20021524
_version_ 1784875609529253888
author Pan, Steven
Chen, Sixia
author_facet Pan, Steven
Chen, Sixia
author_sort Pan, Steven
collection PubMed
description Sample estimates derived from data with missing values may be unreliable and may negatively impact the inferences that researchers make about the underlying population due to nonresponse bias. As a result, imputation is often preferred to listwise deletion in handling multivariate missing data. In this study, we compared three popular imputation methods: sequential multiple imputation, fractional hot-deck imputation, and generalized efficient regression-based imputation with latent processes for handling multivariate missingness under different missing patterns by conducting descriptive and regression analyses on the imputed data and seeing how the estimates differ from those generated from the full sample. Limited Monte Carlo simulation results by using the National Health Nutrition and Examination Survey and Behavioral Risk Factor Surveillance System are presented to demonstrate the effect of each imputation method on reducing bias and increasing efficiency for the parameter estimate of interest for that particular incomplete variable. Although these three methods did not always outperform listwise deletion in our simulated missing patterns, they improved many descriptive and regression estimates when used to impute all incomplete variables at once.
format Online
Article
Text
id pubmed-9864541
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98645412023-01-22 Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health Pan, Steven Chen, Sixia Int J Environ Res Public Health Article Sample estimates derived from data with missing values may be unreliable and may negatively impact the inferences that researchers make about the underlying population due to nonresponse bias. As a result, imputation is often preferred to listwise deletion in handling multivariate missing data. In this study, we compared three popular imputation methods: sequential multiple imputation, fractional hot-deck imputation, and generalized efficient regression-based imputation with latent processes for handling multivariate missingness under different missing patterns by conducting descriptive and regression analyses on the imputed data and seeing how the estimates differ from those generated from the full sample. Limited Monte Carlo simulation results by using the National Health Nutrition and Examination Survey and Behavioral Risk Factor Surveillance System are presented to demonstrate the effect of each imputation method on reducing bias and increasing efficiency for the parameter estimate of interest for that particular incomplete variable. Although these three methods did not always outperform listwise deletion in our simulated missing patterns, they improved many descriptive and regression estimates when used to impute all incomplete variables at once. MDPI 2023-01-14 /pmc/articles/PMC9864541/ /pubmed/36674279 http://dx.doi.org/10.3390/ijerph20021524 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pan, Steven
Chen, Sixia
Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
title Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
title_full Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
title_fullStr Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
title_full_unstemmed Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
title_short Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
title_sort empirical comparison of imputation methods for multivariate missing data in public health
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9864541/
https://www.ncbi.nlm.nih.gov/pubmed/36674279
http://dx.doi.org/10.3390/ijerph20021524
work_keys_str_mv AT pansteven empiricalcomparisonofimputationmethodsformultivariatemissingdatainpublichealth
AT chensixia empiricalcomparisonofimputationmethodsformultivariatemissingdatainpublichealth