Cargando…

Selecting the model for multiple imputation of missing data: Just use an IC!

Multiple imputation and maximum likelihood estimation (via the expectation‐maximization algorithm) are two well‐known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper...

Descripción completa

Detalles Bibliográficos
Autores principales:	Noghrehchi, Firouzeh, Stoklosa, Jakub, Penev, Spiridon, Warton, David I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2021
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248419/ https://www.ncbi.nlm.nih.gov/pubmed/33629367 http://dx.doi.org/10.1002/sim.8915

_version_	1783716720091856896
author	Noghrehchi, Firouzeh Stoklosa, Jakub Penev, Spiridon Warton, David I.
author_facet	Noghrehchi, Firouzeh Stoklosa, Jakub Penev, Spiridon Warton, David I.
author_sort	Noghrehchi, Firouzeh
collection	PubMed
description	Multiple imputation and maximum likelihood estimation (via the expectation‐maximization algorithm) are two well‐known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation‐maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood‐based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity.
format	Online Article Text
id	pubmed-8248419
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-82484192021-07-06 Selecting the model for multiple imputation of missing data: Just use an IC! Noghrehchi, Firouzeh Stoklosa, Jakub Penev, Spiridon Warton, David I. Stat Med Research Articles Multiple imputation and maximum likelihood estimation (via the expectation‐maximization algorithm) are two well‐known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation‐maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood‐based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity. John Wiley and Sons Inc. 2021-02-24 2021-05-10 /pmc/articles/PMC8248419/ /pubmed/33629367 http://dx.doi.org/10.1002/sim.8915 Text en © 2021 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle	Research Articles Noghrehchi, Firouzeh Stoklosa, Jakub Penev, Spiridon Warton, David I. Selecting the model for multiple imputation of missing data: Just use an IC!
title	Selecting the model for multiple imputation of missing data: Just use an IC!
title_full	Selecting the model for multiple imputation of missing data: Just use an IC!
title_fullStr	Selecting the model for multiple imputation of missing data: Just use an IC!
title_full_unstemmed	Selecting the model for multiple imputation of missing data: Just use an IC!
title_short	Selecting the model for multiple imputation of missing data: Just use an IC!
title_sort	selecting the model for multiple imputation of missing data: just use an ic!
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8248419/ https://www.ncbi.nlm.nih.gov/pubmed/33629367 http://dx.doi.org/10.1002/sim.8915
work_keys_str_mv	AT noghrehchifirouzeh selectingthemodelformultipleimputationofmissingdatajustuseanic AT stoklosajakub selectingthemodelformultipleimputationofmissingdatajustuseanic AT penevspiridon selectingthemodelformultipleimputationofmissingdatajustuseanic AT wartondavidi selectingthemodelformultipleimputationofmissingdatajustuseanic

Selecting the model for multiple imputation of missing data: Just use an IC!

Ejemplares similares