Cargando…

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study

BACKGROUND: Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI))...

Descripción completa

Detalles Bibliográficos
Autores principales: De Silva, Anurika Priyanjali, Moreno-Betancur, Margarita, De Livera, Alysha Madhu, Lee, Katherine Jane, Simpson, Julie Anne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5526258/
https://www.ncbi.nlm.nih.gov/pubmed/28743256
http://dx.doi.org/10.1186/s12874-017-0372-y
_version_ 1783252777487564800
author De Silva, Anurika Priyanjali
Moreno-Betancur, Margarita
De Livera, Alysha Madhu
Lee, Katherine Jane
Simpson, Julie Anne
author_facet De Silva, Anurika Priyanjali
Moreno-Betancur, Margarita
De Livera, Alysha Madhu
Lee, Katherine Jane
Simpson, Julie Anne
author_sort De Silva, Anurika Priyanjali
collection PubMed
description BACKGROUND: Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another ‘distinct’ variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time – a commonly encountered scenario in epidemiological studies. METHODS: We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. RESULTS: The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. CONCLUSION: We recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-017-0372-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5526258
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55262582017-08-02 A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study De Silva, Anurika Priyanjali Moreno-Betancur, Margarita De Livera, Alysha Madhu Lee, Katherine Jane Simpson, Julie Anne BMC Med Res Methodol Research Article BACKGROUND: Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another ‘distinct’ variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time – a commonly encountered scenario in epidemiological studies. METHODS: We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. RESULTS: The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. CONCLUSION: We recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-017-0372-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-25 /pmc/articles/PMC5526258/ /pubmed/28743256 http://dx.doi.org/10.1186/s12874-017-0372-y Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
De Silva, Anurika Priyanjali
Moreno-Betancur, Margarita
De Livera, Alysha Madhu
Lee, Katherine Jane
Simpson, Julie Anne
A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
title A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
title_full A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
title_fullStr A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
title_full_unstemmed A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
title_short A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
title_sort comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5526258/
https://www.ncbi.nlm.nih.gov/pubmed/28743256
http://dx.doi.org/10.1186/s12874-017-0372-y
work_keys_str_mv AT desilvaanurikapriyanjali acomparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT morenobetancurmargarita acomparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT deliveraalyshamadhu acomparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT leekatherinejane acomparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT simpsonjulieanne acomparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT desilvaanurikapriyanjali comparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT morenobetancurmargarita comparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT deliveraalyshamadhu comparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT leekatherinejane comparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy
AT simpsonjulieanne comparisonofmultipleimputationmethodsforhandlingmissingvaluesinlongitudinaldatainthepresenceofatimevaryingcovariatewithanonlinearassociationwithtimeasimulationstudy