Cargando…

Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data

Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One ap...

Descripción completa

Detalles Bibliográficos
Autores principales: Welch, Catherine A, Petersen, Irene, Bartlett, Jonathan W, White, Ian R, Marston, Louise, Morris, Richard W, Nazareth, Irwin, Walters, Kate, Carpenter, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BlackWell Publishing Ltd 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4285297/
https://www.ncbi.nlm.nih.gov/pubmed/24782349
http://dx.doi.org/10.1002/sim.6184
_version_ 1782351565493895168
author Welch, Catherine A
Petersen, Irene
Bartlett, Jonathan W
White, Ian R
Marston, Louise
Morris, Richard W
Nazareth, Irwin
Walters, Kate
Carpenter, James
author_facet Welch, Catherine A
Petersen, Irene
Bartlett, Jonathan W
White, Ian R
Marston, Louise
Morris, Richard W
Nazareth, Irwin
Walters, Kate
Carpenter, James
author_sort Welch, Catherine A
collection PubMed
description Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One approach is to divide data into time blocks and implement MI independently at each block. An alternative approach is to include all time blocks in the same MI model. With increasing numbers of time blocks, this approach is likely to break down because of co-linearity and over-fitting. The new two-fold fully conditional specification (FCS) MI algorithm addresses these issues, by only conditioning on measurements, which are local in time. We describe and report the results of a novel simulation study to critically evaluate the two-fold FCS algorithm and its suitability for imputation of longitudinal electronic health records. After generating a full data set, approximately 70% of selected continuous and categorical variables were made missing completely at random in each of ten time blocks. Subsequently, we applied a simple time-to-event model. We compared efficiency of estimated coefficients from a complete records analysis, MI of data in the baseline time block and the two-fold FCS algorithm. The results show that the two-fold FCS algorithm maximises the use of data available, with the gain relative to baseline MI depending on the strength of correlations within and between variables. Using this approach also increases plausibility of the missing at random assumption by using repeated measures over time of variables whose baseline values may be missing.
format Online
Article
Text
id pubmed-4285297
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BlackWell Publishing Ltd
record_format MEDLINE/PubMed
spelling pubmed-42852972015-01-26 Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data Welch, Catherine A Petersen, Irene Bartlett, Jonathan W White, Ian R Marston, Louise Morris, Richard W Nazareth, Irwin Walters, Kate Carpenter, James Stat Med Research Articles Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One approach is to divide data into time blocks and implement MI independently at each block. An alternative approach is to include all time blocks in the same MI model. With increasing numbers of time blocks, this approach is likely to break down because of co-linearity and over-fitting. The new two-fold fully conditional specification (FCS) MI algorithm addresses these issues, by only conditioning on measurements, which are local in time. We describe and report the results of a novel simulation study to critically evaluate the two-fold FCS algorithm and its suitability for imputation of longitudinal electronic health records. After generating a full data set, approximately 70% of selected continuous and categorical variables were made missing completely at random in each of ten time blocks. Subsequently, we applied a simple time-to-event model. We compared efficiency of estimated coefficients from a complete records analysis, MI of data in the baseline time block and the two-fold FCS algorithm. The results show that the two-fold FCS algorithm maximises the use of data available, with the gain relative to baseline MI depending on the strength of correlations within and between variables. Using this approach also increases plausibility of the missing at random assumption by using repeated measures over time of variables whose baseline values may be missing. BlackWell Publishing Ltd 2014-09-20 2014-04-30 /pmc/articles/PMC4285297/ /pubmed/24782349 http://dx.doi.org/10.1002/sim.6184 Text en © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. http://creativecommons.org/licenses/by/3.0/ This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Welch, Catherine A
Petersen, Irene
Bartlett, Jonathan W
White, Ian R
Marston, Louise
Morris, Richard W
Nazareth, Irwin
Walters, Kate
Carpenter, James
Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
title Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
title_full Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
title_fullStr Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
title_full_unstemmed Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
title_short Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
title_sort evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4285297/
https://www.ncbi.nlm.nih.gov/pubmed/24782349
http://dx.doi.org/10.1002/sim.6184
work_keys_str_mv AT welchcatherinea evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT petersenirene evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT bartlettjonathanw evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT whiteianr evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT marstonlouise evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT morrisrichardw evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT nazarethirwin evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT walterskate evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata
AT carpenterjames evaluationoftwofoldfullyconditionalspecificationmultipleimputationforlongitudinalelectronichealthrecorddata