Cargando…

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study

BACKGROUND: Longitudinal categorical variables are sometimes restricted in terms of how individuals transition between categories over time. For example, with a time-dependent measure of smoking categorised as never-smoker, ex-smoker, and current-smoker, current-smokers or ex-smokers cannot transiti...

Descripción completa

Detalles Bibliográficos
Autores principales:	De Silva, Anurika Priyanjali, Moreno-Betancur, Margarita, De Livera, Alysha Madhu, Lee, Katherine Jane, Simpson, Julie Anne
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329074/ https://www.ncbi.nlm.nih.gov/pubmed/30630434 http://dx.doi.org/10.1186/s12874-018-0653-0

_version_	1783386763005263872
author	De Silva, Anurika Priyanjali Moreno-Betancur, Margarita De Livera, Alysha Madhu Lee, Katherine Jane Simpson, Julie Anne
author_facet	De Silva, Anurika Priyanjali Moreno-Betancur, Margarita De Livera, Alysha Madhu Lee, Katherine Jane Simpson, Julie Anne
author_sort	De Silva, Anurika Priyanjali
collection	PubMed
description	BACKGROUND: Longitudinal categorical variables are sometimes restricted in terms of how individuals transition between categories over time. For example, with a time-dependent measure of smoking categorised as never-smoker, ex-smoker, and current-smoker, current-smokers or ex-smokers cannot transition to a never-smoker at a subsequent wave. These longitudinal variables often contain missing values, however, there is little guidance on whether these restrictions need to be accommodated when using multiple imputation methods. Multiply imputing such missing values, ignoring the restrictions, could lead to implausible transitions. METHODS: We designed a simulation study based on the Longitudinal Study of Australian Children, where the target analysis was the association between (incomplete) maternal smoking and childhood obesity. We set varying proportions of data on maternal smoking to missing completely at random or missing at random. We compared the performance of fully conditional specification with multinomial and ordinal logistic imputation, and predictive mean matching, two-fold fully conditional specification, indicator based imputation under multivariate normal imputation with projected distance-based rounding, and continuous imputation under multivariate normal imputation with calibration, where each of these multiple imputation methods were applied, accounting for the restrictions using a semi-deterministic imputation procedure. RESULTS: Overall, we observed reduced bias when applying multiple imputation methods with restrictions, and fully conditional specification with predictive mean matching performed the best. Applying fully conditional specification and two-fold fully conditional specification for imputing nominal variables based on multinomial logistic regression had severe convergence issues. Both imputation methods under multivariate normal imputation produced biased estimates when restrictions were not accommodated, however, we observed substantial reductions in bias when restrictions were applied with continuous imputation under multivariate normal imputation with calibration. CONCLUSION: In a similar longitudinal setting we recommend the use of fully conditional specification with predictive mean matching, with restrictions applied during the imputation stage. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-018-0653-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6329074
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63290742019-01-16 Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study De Silva, Anurika Priyanjali Moreno-Betancur, Margarita De Livera, Alysha Madhu Lee, Katherine Jane Simpson, Julie Anne BMC Med Res Methodol Research Article BACKGROUND: Longitudinal categorical variables are sometimes restricted in terms of how individuals transition between categories over time. For example, with a time-dependent measure of smoking categorised as never-smoker, ex-smoker, and current-smoker, current-smokers or ex-smokers cannot transition to a never-smoker at a subsequent wave. These longitudinal variables often contain missing values, however, there is little guidance on whether these restrictions need to be accommodated when using multiple imputation methods. Multiply imputing such missing values, ignoring the restrictions, could lead to implausible transitions. METHODS: We designed a simulation study based on the Longitudinal Study of Australian Children, where the target analysis was the association between (incomplete) maternal smoking and childhood obesity. We set varying proportions of data on maternal smoking to missing completely at random or missing at random. We compared the performance of fully conditional specification with multinomial and ordinal logistic imputation, and predictive mean matching, two-fold fully conditional specification, indicator based imputation under multivariate normal imputation with projected distance-based rounding, and continuous imputation under multivariate normal imputation with calibration, where each of these multiple imputation methods were applied, accounting for the restrictions using a semi-deterministic imputation procedure. RESULTS: Overall, we observed reduced bias when applying multiple imputation methods with restrictions, and fully conditional specification with predictive mean matching performed the best. Applying fully conditional specification and two-fold fully conditional specification for imputing nominal variables based on multinomial logistic regression had severe convergence issues. Both imputation methods under multivariate normal imputation produced biased estimates when restrictions were not accommodated, however, we observed substantial reductions in bias when restrictions were applied with continuous imputation under multivariate normal imputation with calibration. CONCLUSION: In a similar longitudinal setting we recommend the use of fully conditional specification with predictive mean matching, with restrictions applied during the imputation stage. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-018-0653-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-10 /pmc/articles/PMC6329074/ /pubmed/30630434 http://dx.doi.org/10.1186/s12874-018-0653-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article De Silva, Anurika Priyanjali Moreno-Betancur, Margarita De Livera, Alysha Madhu Lee, Katherine Jane Simpson, Julie Anne Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
title	Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
title_full	Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
title_fullStr	Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
title_full_unstemmed	Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
title_short	Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
title_sort	multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329074/ https://www.ncbi.nlm.nih.gov/pubmed/30630434 http://dx.doi.org/10.1186/s12874-018-0653-0
work_keys_str_mv	AT desilvaanurikapriyanjali multipleimputationmethodsforhandlingmissingvaluesinalongitudinalcategoricalvariablewithrestrictionsontransitionsovertimeasimulationstudy AT morenobetancurmargarita multipleimputationmethodsforhandlingmissingvaluesinalongitudinalcategoricalvariablewithrestrictionsontransitionsovertimeasimulationstudy AT deliveraalyshamadhu multipleimputationmethodsforhandlingmissingvaluesinalongitudinalcategoricalvariablewithrestrictionsontransitionsovertimeasimulationstudy AT leekatherinejane multipleimputationmethodsforhandlingmissingvaluesinalongitudinalcategoricalvariablewithrestrictionsontransitionsovertimeasimulationstudy AT simpsonjulieanne multipleimputationmethodsforhandlingmissingvaluesinalongitudinalcategoricalvariablewithrestrictionsontransitionsovertimeasimulationstudy

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study

Ejemplares similares