Cargando…

Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach

PURPOSE: Researchers often use model-based multiple imputation to handle missing at random data to minimize bias. However, constraints within the data may sometimes result in implausible values, making model-based imputation infeasible. In these contexts, we illustrate how random hot deck imputation...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Chinchin, Stokes, Tyrel, Steele, Russell J, Wedderkopp, Niels, Shrier, Ian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Dove 2022
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9675352/ https://www.ncbi.nlm.nih.gov/pubmed/36411940 http://dx.doi.org/10.2147/CLEP.S368303

_version_	1784833355071619072
author	Wang, Chinchin Stokes, Tyrel Steele, Russell J Wedderkopp, Niels Shrier, Ian
author_facet	Wang, Chinchin Stokes, Tyrel Steele, Russell J Wedderkopp, Niels Shrier, Ian
author_sort	Wang, Chinchin
collection	PubMed
description	PURPOSE: Researchers often use model-based multiple imputation to handle missing at random data to minimize bias. However, constraints within the data may sometimes result in implausible values, making model-based imputation infeasible. In these contexts, we illustrate how random hot deck imputation can allow for plausible multiple imputation in longitudinal studies. PATIENTS AND METHODS: Our motivating example is the Childhood Health, Activity, and Motor Performance School Study Denmark (CHAMPS-DK), a prospective cohort study that measured weekly sports participation for 1700 Danish schoolchildren. Using observed data on 4 variables (pain, activity frequency, sport, sport counts), we created a gold-standard data set without missing data. We then created a synthetic data set by setting some variable values to missing based on a prediction model that mimicked real-data missingness patterns. To create 5 imputed data sets, we matched each record with missing data to several fully observed records, generated probabilities from matched records, and sampled from these records based on the probability of each occurring. We assessed variability and agreement (kappa) between the imputed data sets and the gold-standard data set. We compare results to common model-based imputation methods. RESULTS: Variability across data sets appeared reasonable. The range of kappa for the random hot deck approach was moderate for activity frequency (0.65 to 0.71) and sport (0.59 to 0.85), and poor for common model-based approaches (range 0.00 to 0.11). The range of kappas for sport count was strong (0.87 to 0.97) for random hot deck imputation and weak to moderate (0.55 to 0.71) for common model-based imputation. Agreement was higher when more information was present, and when prevalence was higher for our binary variable sport. CONCLUSION: Random hot deck imputation should be considered as an alternative method when model-based approaches are infeasible, specifically where there are constraints within and between covariates.
format	Online Article Text
id	pubmed-9675352
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Dove
record_format	MEDLINE/PubMed
spelling	pubmed-96753522022-11-20 Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach Wang, Chinchin Stokes, Tyrel Steele, Russell J Wedderkopp, Niels Shrier, Ian Clin Epidemiol Original Research PURPOSE: Researchers often use model-based multiple imputation to handle missing at random data to minimize bias. However, constraints within the data may sometimes result in implausible values, making model-based imputation infeasible. In these contexts, we illustrate how random hot deck imputation can allow for plausible multiple imputation in longitudinal studies. PATIENTS AND METHODS: Our motivating example is the Childhood Health, Activity, and Motor Performance School Study Denmark (CHAMPS-DK), a prospective cohort study that measured weekly sports participation for 1700 Danish schoolchildren. Using observed data on 4 variables (pain, activity frequency, sport, sport counts), we created a gold-standard data set without missing data. We then created a synthetic data set by setting some variable values to missing based on a prediction model that mimicked real-data missingness patterns. To create 5 imputed data sets, we matched each record with missing data to several fully observed records, generated probabilities from matched records, and sampled from these records based on the probability of each occurring. We assessed variability and agreement (kappa) between the imputed data sets and the gold-standard data set. We compare results to common model-based imputation methods. RESULTS: Variability across data sets appeared reasonable. The range of kappa for the random hot deck approach was moderate for activity frequency (0.65 to 0.71) and sport (0.59 to 0.85), and poor for common model-based approaches (range 0.00 to 0.11). The range of kappas for sport count was strong (0.87 to 0.97) for random hot deck imputation and weak to moderate (0.55 to 0.71) for common model-based imputation. Agreement was higher when more information was present, and when prevalence was higher for our binary variable sport. CONCLUSION: Random hot deck imputation should be considered as an alternative method when model-based approaches are infeasible, specifically where there are constraints within and between covariates. Dove 2022-11-15 /pmc/articles/PMC9675352/ /pubmed/36411940 http://dx.doi.org/10.2147/CLEP.S368303 Text en © 2022 Wang et al. https://creativecommons.org/licenses/by-nc/3.0/This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/ (https://creativecommons.org/licenses/by-nc/3.0/) ). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php).
spellingShingle	Original Research Wang, Chinchin Stokes, Tyrel Steele, Russell J Wedderkopp, Niels Shrier, Ian Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach
title	Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach
title_full	Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach
title_fullStr	Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach
title_full_unstemmed	Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach
title_short	Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach
title_sort	implementing multiple imputation for missing data in longitudinal studies when models are not feasible: an example using the random hot deck approach
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9675352/ https://www.ncbi.nlm.nih.gov/pubmed/36411940 http://dx.doi.org/10.2147/CLEP.S368303
work_keys_str_mv	AT wangchinchin implementingmultipleimputationformissingdatainlongitudinalstudieswhenmodelsarenotfeasibleanexampleusingtherandomhotdeckapproach AT stokestyrel implementingmultipleimputationformissingdatainlongitudinalstudieswhenmodelsarenotfeasibleanexampleusingtherandomhotdeckapproach AT steelerussellj implementingmultipleimputationformissingdatainlongitudinalstudieswhenmodelsarenotfeasibleanexampleusingtherandomhotdeckapproach AT wedderkoppniels implementingmultipleimputationformissingdatainlongitudinalstudieswhenmodelsarenotfeasibleanexampleusingtherandomhotdeckapproach AT shrierian implementingmultipleimputationformissingdatainlongitudinalstudieswhenmodelsarenotfeasibleanexampleusingtherandomhotdeckapproach

Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach

Ejemplares similares