Cargando…

The proportion of missing data should not be used to guide decisions on multiple imputation

OBJECTIVES: Researchers are concerned whether multiple imputation (MI) or complete case analysis should be used when a large proportion of data are missing. We aimed to provide guidance for drawing conclusions from data with a large proportion of missingness. STUDY DESIGN AND SETTING: Via simulation...

Descripción completa

Detalles Bibliográficos
Autores principales: Madley-Dowd, Paul, Hughes, Rachael, Tilling, Kate, Heron, Jon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547017/
https://www.ncbi.nlm.nih.gov/pubmed/30878639
http://dx.doi.org/10.1016/j.jclinepi.2019.02.016
_version_ 1783423619801546752
author Madley-Dowd, Paul
Hughes, Rachael
Tilling, Kate
Heron, Jon
author_facet Madley-Dowd, Paul
Hughes, Rachael
Tilling, Kate
Heron, Jon
author_sort Madley-Dowd, Paul
collection PubMed
description OBJECTIVES: Researchers are concerned whether multiple imputation (MI) or complete case analysis should be used when a large proportion of data are missing. We aimed to provide guidance for drawing conclusions from data with a large proportion of missingness. STUDY DESIGN AND SETTING: Via simulations, we investigated how the proportion of missing data, the fraction of missing information (FMI), and availability of auxiliary variables affected MI performance. Outcome data were missing completely at random or missing at random (MAR). RESULTS: Provided sufficient auxiliary information was available; MI was beneficial in terms of bias and never detrimental in terms of efficiency. Models with similar FMI values, but differing proportions of missing data, also had similar precision for effect estimates. In the absence of bias, the FMI was a better guide to the efficiency gains using MI than the proportion of missing data. CONCLUSION: We provide evidence that for MAR data, valid MI reduces bias even when the proportion of missingness is large. We advise researchers to use FMI to guide choice of auxiliary variables for efficiency gain in imputation analyses, and that sensitivity analyses including different imputation models may be needed if the number of complete cases is small.
format Online
Article
Text
id pubmed-6547017
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-65470172019-06-06 The proportion of missing data should not be used to guide decisions on multiple imputation Madley-Dowd, Paul Hughes, Rachael Tilling, Kate Heron, Jon J Clin Epidemiol Article OBJECTIVES: Researchers are concerned whether multiple imputation (MI) or complete case analysis should be used when a large proportion of data are missing. We aimed to provide guidance for drawing conclusions from data with a large proportion of missingness. STUDY DESIGN AND SETTING: Via simulations, we investigated how the proportion of missing data, the fraction of missing information (FMI), and availability of auxiliary variables affected MI performance. Outcome data were missing completely at random or missing at random (MAR). RESULTS: Provided sufficient auxiliary information was available; MI was beneficial in terms of bias and never detrimental in terms of efficiency. Models with similar FMI values, but differing proportions of missing data, also had similar precision for effect estimates. In the absence of bias, the FMI was a better guide to the efficiency gains using MI than the proportion of missing data. CONCLUSION: We provide evidence that for MAR data, valid MI reduces bias even when the proportion of missingness is large. We advise researchers to use FMI to guide choice of auxiliary variables for efficiency gain in imputation analyses, and that sensitivity analyses including different imputation models may be needed if the number of complete cases is small. Elsevier 2019-06 /pmc/articles/PMC6547017/ /pubmed/30878639 http://dx.doi.org/10.1016/j.jclinepi.2019.02.016 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Madley-Dowd, Paul
Hughes, Rachael
Tilling, Kate
Heron, Jon
The proportion of missing data should not be used to guide decisions on multiple imputation
title The proportion of missing data should not be used to guide decisions on multiple imputation
title_full The proportion of missing data should not be used to guide decisions on multiple imputation
title_fullStr The proportion of missing data should not be used to guide decisions on multiple imputation
title_full_unstemmed The proportion of missing data should not be used to guide decisions on multiple imputation
title_short The proportion of missing data should not be used to guide decisions on multiple imputation
title_sort proportion of missing data should not be used to guide decisions on multiple imputation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547017/
https://www.ncbi.nlm.nih.gov/pubmed/30878639
http://dx.doi.org/10.1016/j.jclinepi.2019.02.016
work_keys_str_mv AT madleydowdpaul theproportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT hughesrachael theproportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT tillingkate theproportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT heronjon theproportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT madleydowdpaul proportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT hughesrachael proportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT tillingkate proportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation
AT heronjon proportionofmissingdatashouldnotbeusedtoguidedecisionsonmultipleimputation