Cargando…
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias
Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615309/ https://www.ncbi.nlm.nih.gov/pubmed/37974561 http://dx.doi.org/10.3389/fepid.2023.1237447 |
_version_ | 1785145790992220160 |
---|---|
author | Curnow, Elinor Tilling, Kate Heron, Jon E. Cornish, Rosie P. Carpenter, James R. |
author_facet | Curnow, Elinor Tilling, Kate Heron, Jon E. Cornish, Rosie P. Carpenter, James R. |
author_sort | Curnow, Elinor |
collection | PubMed |
description | Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders. |
format | Online Article Text |
id | pubmed-7615309 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
record_format | MEDLINE/PubMed |
spelling | pubmed-76153092023-11-16 Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias Curnow, Elinor Tilling, Kate Heron, Jon E. Cornish, Rosie P. Carpenter, James R. Front Epidemiol Article Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders. 2023-09-15 /pmc/articles/PMC7615309/ /pubmed/37974561 http://dx.doi.org/10.3389/fepid.2023.1237447 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a BY 4.0 (https://creativecommons.org/licenses/by/4.0/) International license. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Article Curnow, Elinor Tilling, Kate Heron, Jon E. Cornish, Rosie P. Carpenter, James R. Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
title | Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
title_full | Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
title_fullStr | Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
title_full_unstemmed | Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
title_short | Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
title_sort | multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615309/ https://www.ncbi.nlm.nih.gov/pubmed/37974561 http://dx.doi.org/10.3389/fepid.2023.1237447 |
work_keys_str_mv | AT curnowelinor multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias AT tillingkate multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias AT heronjone multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias AT cornishrosiep multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias AT carpenterjamesr multipleimputationofmissingdataundermissingatrandomincludingacolliderasanauxiliaryvariableintheimputationmodelcaninducebias |