Cargando…

The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study

BACKGROUND: Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a...

Descripción completa

Detalles Bibliográficos
Autores principales: Karahalios, Amalia, Baglietto, Laura, Lee, Katherine J, English, Dallas R, Carlin, John B, Simpson, Julie A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751092/
https://www.ncbi.nlm.nih.gov/pubmed/23947681
http://dx.doi.org/10.1186/1742-7622-10-6
_version_ 1782281532080128000
author Karahalios, Amalia
Baglietto, Laura
Lee, Katherine J
English, Dallas R
Carlin, John B
Simpson, Julie A
author_facet Karahalios, Amalia
Baglietto, Laura
Lee, Katherine J
English, Dallas R
Carlin, John B
Simpson, Julie A
author_sort Karahalios, Amalia
collection PubMed
description BACKGROUND: Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome). METHODS: We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1. RESULTS: We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model. CONCLUSIONS: This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.
format Online
Article
Text
id pubmed-3751092
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37510922013-08-28 The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study Karahalios, Amalia Baglietto, Laura Lee, Katherine J English, Dallas R Carlin, John B Simpson, Julie A Emerg Themes Epidemiol Research Article BACKGROUND: Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome). METHODS: We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1. RESULTS: We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model. CONCLUSIONS: This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model. BioMed Central 2013-08-19 /pmc/articles/PMC3751092/ /pubmed/23947681 http://dx.doi.org/10.1186/1742-7622-10-6 Text en Copyright © 2013 Karahalios et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Karahalios, Amalia
Baglietto, Laura
Lee, Katherine J
English, Dallas R
Carlin, John B
Simpson, Julie A
The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
title The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
title_full The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
title_fullStr The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
title_full_unstemmed The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
title_short The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
title_sort impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751092/
https://www.ncbi.nlm.nih.gov/pubmed/23947681
http://dx.doi.org/10.1186/1742-7622-10-6
work_keys_str_mv AT karahaliosamalia theimpactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT bagliettolaura theimpactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT leekatherinej theimpactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT englishdallasr theimpactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT carlinjohnb theimpactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT simpsonjuliea theimpactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT karahaliosamalia impactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT bagliettolaura impactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT leekatherinej impactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT englishdallasr impactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT carlinjohnb impactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy
AT simpsonjuliea impactofmissingdataonanalysesofatimedependentexposureinalongitudinalcohortasimulationstudy