Cargando…

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed...

Descripción completa

Detalles Bibliográficos
Autores principales: Curnow, Elinor, Tilling, Kate, Heron, Jon E., Cornish, Rosie P., Carpenter, James R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615309/
https://www.ncbi.nlm.nih.gov/pubmed/37974561
http://dx.doi.org/10.3389/fepid.2023.1237447
_version_ 1785145790992220160
author Curnow, Elinor
Tilling, Kate
Heron, Jon E.
Cornish, Rosie P.
Carpenter, James R.
author_facet Curnow, Elinor
Tilling, Kate
Heron, Jon E.
Cornish, Rosie P.
Carpenter, James R.
author_sort Curnow, Elinor
collection PubMed
description Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.
format Online
Article
Text
id pubmed-7615309
institution National Center for Biotechnology Information
language English
publishDate 2023
record_format MEDLINE/PubMed
spelling pubmed-76153092023-11-16 Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias Curnow, Elinor Tilling, Kate Heron, Jon E. Cornish, Rosie P. Carpenter, James R. Front Epidemiol Article Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders. 2023-09-15 /pmc/articles/PMC7615309/ /pubmed/37974561 http://dx.doi.org/10.3389/fepid.2023.1237447 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a BY 4.0 (https://creativecommons.org/licenses/by/4.0/) International license. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Article
Curnow, Elinor
Tilling, Kate
Heron, Jon E.
Cornish, Rosie P.
Carpenter, James R.
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
title Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
title_full Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
title_fullStr Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
title_full_unstemmed Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
title_short Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
title_sort multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615309/
https://www.ncbi.nlm.nih.gov/pubmed/37974561
http://dx.doi.org/10.3389/fepid.2023.1237447
work_keys_str_mv AT curnowelinor multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias
AT tillingkate multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias
AT heronjone multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias
AT cornishrosiep multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias
AT carpenterjamesr multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias