Cargando…

How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

OBJECTIVES: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stavseth, Marianne Riksheim, Clausen, Thomas, Røislien, Jo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2019
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329020/ https://www.ncbi.nlm.nih.gov/pubmed/30671242 http://dx.doi.org/10.1177/2050312118822912

_version_	1783386750892113920
author	Stavseth, Marianne Riksheim Clausen, Thomas Røislien, Jo
author_facet	Stavseth, Marianne Riksheim Clausen, Thomas Røislien, Jo
author_sort	Stavseth, Marianne Riksheim
collection	PubMed
description	OBJECTIVES: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. METHODS: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation–maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. RESULTS: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was ⩾20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. CONCLUSION: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration.
format	Online Article Text
id	pubmed-6329020
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-63290202019-01-22 How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data Stavseth, Marianne Riksheim Clausen, Thomas Røislien, Jo SAGE Open Med Original Article OBJECTIVES: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. METHODS: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation–maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. RESULTS: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was ⩾20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. CONCLUSION: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration. SAGE Publications 2019-01-08 /pmc/articles/PMC6329020/ /pubmed/30671242 http://dx.doi.org/10.1177/2050312118822912 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Article Stavseth, Marianne Riksheim Clausen, Thomas Røislien, Jo How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data
title	How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data
title_full	How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data
title_fullStr	How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data
title_full_unstemmed	How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data
title_short	How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data
title_sort	how handling missing data may impact conclusions: a comparison of six different imputation methods for categorical questionnaire data
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329020/ https://www.ncbi.nlm.nih.gov/pubmed/30671242 http://dx.doi.org/10.1177/2050312118822912
work_keys_str_mv	AT stavsethmarianneriksheim howhandlingmissingdatamayimpactconclusionsacomparisonofsixdifferentimputationmethodsforcategoricalquestionnairedata AT clausenthomas howhandlingmissingdatamayimpactconclusionsacomparisonofsixdifferentimputationmethodsforcategoricalquestionnairedata AT røislienjo howhandlingmissingdatamayimpactconclusionsacomparisonofsixdifferentimputationmethodsforcategoricalquestionnairedata

How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

Ejemplares similares