Cargando…

The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure

Secondary datasets are used in healthcare research because of its cost advantages, its convenience, and the size of the datasets. However, missing data can cause problems that are difficult to resolve. This manuscript reviews possible causes for missing data, and how to address them. Many researcher...

Descripción completa

Detalles Bibliográficos
Autor principal: Jo, Soojung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9069597/
https://www.ncbi.nlm.nih.gov/pubmed/35502776
http://dx.doi.org/10.1177/00469580221088627
_version_ 1784700465563303936
author Jo, Soojung
author_facet Jo, Soojung
author_sort Jo, Soojung
collection PubMed
description Secondary datasets are used in healthcare research because of its cost advantages, its convenience, and the size of the datasets. However, missing data can cause problems that are difficult to resolve. This manuscript reviews possible causes for missing data, and how to address them. Many researchers use multiple imputation as a solution, which consists of three phases: (a) the imputation phase, (b) the analysis phase, and (c) the pooling phase. When missing data is caused by a refusal to answer or by insufficient knowledge, multiple imputation works well. However, difficulties arise when there are problems with screening questions. If respondents do not answer a screening question, possible answers could be either “yes” or “no.” This paper suggests identifying “yes” responses on the screening question, and setting them aside for use in the analysis. The reasons for this approach are the impossibility of conducting multiple imputation twice, the problem of imputation based on the population after sample weight, and the difficulty of producing logical errors on the estimation in imputation phase. This manuscript uses as an example the techniques used to address missing data from screening questions in a national US dataset. These techniques of multiple imputation using examples from the dataset could be used by researchers in future healthcare research that relies on secondary datasets.
format Online
Article
Text
id pubmed-9069597
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-90695972022-05-05 The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure Jo, Soojung Inquiry Review Article Secondary datasets are used in healthcare research because of its cost advantages, its convenience, and the size of the datasets. However, missing data can cause problems that are difficult to resolve. This manuscript reviews possible causes for missing data, and how to address them. Many researchers use multiple imputation as a solution, which consists of three phases: (a) the imputation phase, (b) the analysis phase, and (c) the pooling phase. When missing data is caused by a refusal to answer or by insufficient knowledge, multiple imputation works well. However, difficulties arise when there are problems with screening questions. If respondents do not answer a screening question, possible answers could be either “yes” or “no.” This paper suggests identifying “yes” responses on the screening question, and setting them aside for use in the analysis. The reasons for this approach are the impossibility of conducting multiple imputation twice, the problem of imputation based on the population after sample weight, and the difficulty of producing logical errors on the estimation in imputation phase. This manuscript uses as an example the techniques used to address missing data from screening questions in a national US dataset. These techniques of multiple imputation using examples from the dataset could be used by researchers in future healthcare research that relies on secondary datasets. SAGE Publications 2022-05-03 /pmc/articles/PMC9069597/ /pubmed/35502776 http://dx.doi.org/10.1177/00469580221088627 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Review Article
Jo, Soojung
The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure
title The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure
title_full The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure
title_fullStr The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure
title_full_unstemmed The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure
title_short The Use of Multiple Imputation to Handle Missing Data in Secondary Datasets: Suggested Approaches when Missing Data Results from the Survey Structure
title_sort use of multiple imputation to handle missing data in secondary datasets: suggested approaches when missing data results from the survey structure
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9069597/
https://www.ncbi.nlm.nih.gov/pubmed/35502776
http://dx.doi.org/10.1177/00469580221088627
work_keys_str_mv AT josoojung theuseofmultipleimputationtohandlemissingdatainsecondarydatasetssuggestedapproacheswhenmissingdataresultsfromthesurveystructure
AT josoojung useofmultipleimputationtohandlemissingdatainsecondarydatasetssuggestedapproacheswhenmissingdataresultsfromthesurveystructure