Cargando…

Accounting for missing data in statistical analyses: multiple imputation is not always the answer

BACKGROUND: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situation...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hughes, Rachael A, Heron, Jon, Sterne, Jonathan A C, Tilling, Kate
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693809/ https://www.ncbi.nlm.nih.gov/pubmed/30879056 http://dx.doi.org/10.1093/ije/dyz032

_version_	1783443744200065024
author	Hughes, Rachael A Heron, Jon Sterne, Jonathan A C Tilling, Kate
author_facet	Hughes, Rachael A Heron, Jon Sterne, Jonathan A C Tilling, Kate
author_sort	Hughes, Rachael A
collection	PubMed
description	BACKGROUND: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations. METHODS: We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice. RESULTS: For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example. CONCLUSIONS: Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information.
format	Online Article Text
id	pubmed-6693809
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-66938092019-08-19 Accounting for missing data in statistical analyses: multiple imputation is not always the answer Hughes, Rachael A Heron, Jon Sterne, Jonathan A C Tilling, Kate Int J Epidemiol Methods BACKGROUND: Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations. METHODS: We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice. RESULTS: For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example. CONCLUSIONS: Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information. Oxford University Press 2019-08 2019-03-16 /pmc/articles/PMC6693809/ /pubmed/30879056 http://dx.doi.org/10.1093/ije/dyz032 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the International Epidemiological Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Hughes, Rachael A Heron, Jon Sterne, Jonathan A C Tilling, Kate Accounting for missing data in statistical analyses: multiple imputation is not always the answer
title	Accounting for missing data in statistical analyses: multiple imputation is not always the answer
title_full	Accounting for missing data in statistical analyses: multiple imputation is not always the answer
title_fullStr	Accounting for missing data in statistical analyses: multiple imputation is not always the answer
title_full_unstemmed	Accounting for missing data in statistical analyses: multiple imputation is not always the answer
title_short	Accounting for missing data in statistical analyses: multiple imputation is not always the answer
title_sort	accounting for missing data in statistical analyses: multiple imputation is not always the answer
topic	Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693809/ https://www.ncbi.nlm.nih.gov/pubmed/30879056 http://dx.doi.org/10.1093/ije/dyz032
work_keys_str_mv	AT hughesrachaela accountingformissingdatainstatisticalanalysesmultipleimputationisnotalwaystheanswer AT heronjon accountingformissingdatainstatisticalanalysesmultipleimputationisnotalwaystheanswer AT sternejonathanac accountingformissingdatainstatisticalanalysesmultipleimputationisnotalwaystheanswer AT tillingkate accountingformissingdatainstatisticalanalysesmultipleimputationisnotalwaystheanswer

Accounting for missing data in statistical analyses: multiple imputation is not always the answer

Ejemplares similares