Cargando…

Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification

Researchers faced with incomplete data are encouraged to consider whether their data are ‘missing completely at random’ (MCAR), ‘missing at random’ (MAR) or ‘missing not at random’ (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defin...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Katherine J, Carlin, John B, Simpson, Julie A, Moreno-Betancur, Margarita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10396404/
https://www.ncbi.nlm.nih.gov/pubmed/36779333
http://dx.doi.org/10.1093/ije/dyad008
_version_ 1785083748462624768
author Lee, Katherine J
Carlin, John B
Simpson, Julie A
Moreno-Betancur, Margarita
author_facet Lee, Katherine J
Carlin, John B
Simpson, Julie A
Moreno-Betancur, Margarita
author_sort Lee, Katherine J
collection PubMed
description Researchers faced with incomplete data are encouraged to consider whether their data are ‘missing completely at random’ (MCAR), ‘missing at random’ (MAR) or ‘missing not at random’ (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defined by Rubin in the 1970s. First, when there are missing data in multiple variables, the plausibility of the MAR assumption is difficult to assess using substantive knowledge and is more stringent than is generally appreciated. Second, although MCAR and MAR are sufficient conditions for consistent estimation with specific methods, they are not necessary conditions and therefore this categorization does not directly determine the best approach for handling the missing data in an analysis. How best to handle missing data depends on the assumed causal relationships between variables and their missingness, and what these relationships imply in terms of the ‘recoverability’ of the target estimand (the population parameter that encodes the answer to the underlying research question). Recoverability is defined as whether the estimand can be consistently estimated from the patterns and associations in the observed data without needing to invoke external information on the extent to which the distribution of missing values might differ from that of observed values. In this manuscript we outline an approach for deciding which method to use to handle multivariable missing data in an analysis, using directed acyclic graphs to depict missingness assumptions and determining the implications in terms of recoverability of the target estimand.
format Online
Article
Text
id pubmed-10396404
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103964042023-08-03 Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification Lee, Katherine J Carlin, John B Simpson, Julie A Moreno-Betancur, Margarita Int J Epidemiol Education Corner Researchers faced with incomplete data are encouraged to consider whether their data are ‘missing completely at random’ (MCAR), ‘missing at random’ (MAR) or ‘missing not at random’ (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defined by Rubin in the 1970s. First, when there are missing data in multiple variables, the plausibility of the MAR assumption is difficult to assess using substantive knowledge and is more stringent than is generally appreciated. Second, although MCAR and MAR are sufficient conditions for consistent estimation with specific methods, they are not necessary conditions and therefore this categorization does not directly determine the best approach for handling the missing data in an analysis. How best to handle missing data depends on the assumed causal relationships between variables and their missingness, and what these relationships imply in terms of the ‘recoverability’ of the target estimand (the population parameter that encodes the answer to the underlying research question). Recoverability is defined as whether the estimand can be consistently estimated from the patterns and associations in the observed data without needing to invoke external information on the extent to which the distribution of missing values might differ from that of observed values. In this manuscript we outline an approach for deciding which method to use to handle multivariable missing data in an analysis, using directed acyclic graphs to depict missingness assumptions and determining the implications in terms of recoverability of the target estimand. Oxford University Press 2023-02-13 /pmc/articles/PMC10396404/ /pubmed/36779333 http://dx.doi.org/10.1093/ije/dyad008 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the International Epidemiological Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Education Corner
Lee, Katherine J
Carlin, John B
Simpson, Julie A
Moreno-Betancur, Margarita
Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification
title Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification
title_full Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification
title_fullStr Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification
title_full_unstemmed Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification
title_short Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification
title_sort assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the mcar/mar/mnar classification
topic Education Corner
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10396404/
https://www.ncbi.nlm.nih.gov/pubmed/36779333
http://dx.doi.org/10.1093/ije/dyad008
work_keys_str_mv AT leekatherinej assumptionsandanalysisplanninginstudieswithmissingdatainmultiplevariablesmovingbeyondthemcarmarmnarclassification
AT carlinjohnb assumptionsandanalysisplanninginstudieswithmissingdatainmultiplevariablesmovingbeyondthemcarmarmnarclassification
AT simpsonjuliea assumptionsandanalysisplanninginstudieswithmissingdatainmultiplevariablesmovingbeyondthemcarmarmnarclassification
AT morenobetancurmargarita assumptionsandanalysisplanninginstudieswithmissingdatainmultiplevariablesmovingbeyondthemcarmarmnarclassification