Cargando…

Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?

BACKGROUND: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. METHODS: Using data from “Predictive Stu...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Jin Hyuk, Huber, J. Charles
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Tehran University of Medical Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8426774/
https://www.ncbi.nlm.nih.gov/pubmed/34568175
http://dx.doi.org/10.18502/ijph.v50i7.6626
_version_ 1783750101963898880
author Lee, Jin Hyuk
Huber, J. Charles
author_facet Lee, Jin Hyuk
Huber, J. Charles
author_sort Lee, Jin Hyuk
collection PubMed
description BACKGROUND: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. METHODS: Using data from “Predictive Study of Coronary Heart Disease” study, this study examined the effectiveness of multiple imputation in data with 20% missing to 80% missing observations using absolute bias (|bias|) and Root Mean Square Error (RMSE) of MI measured under Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR) assumptions. RESULTS: The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms. CONCLUSION: As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.
format Online
Article
Text
id pubmed-8426774
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Tehran University of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-84267742021-09-24 Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much? Lee, Jin Hyuk Huber, J. Charles Iran J Public Health Original Article BACKGROUND: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. METHODS: Using data from “Predictive Study of Coronary Heart Disease” study, this study examined the effectiveness of multiple imputation in data with 20% missing to 80% missing observations using absolute bias (|bias|) and Root Mean Square Error (RMSE) of MI measured under Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR) assumptions. RESULTS: The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms. CONCLUSION: As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation. Tehran University of Medical Sciences 2021-07 /pmc/articles/PMC8426774/ /pubmed/34568175 http://dx.doi.org/10.18502/ijph.v50i7.6626 Text en Copyright © 2021 Lee et al. Published by Tehran University of Medical Sciences https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International license (https://creativecommons.org/licenses/by-nc/4.0/). Non-commercial uses of the work are permitted, provided the original work is properly cited.
spellingShingle Original Article
Lee, Jin Hyuk
Huber, J. Charles
Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?
title Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?
title_full Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?
title_fullStr Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?
title_full_unstemmed Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?
title_short Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?
title_sort evaluation of multiple imputation with large proportions of missing data: how much is too much?
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8426774/
https://www.ncbi.nlm.nih.gov/pubmed/34568175
http://dx.doi.org/10.18502/ijph.v50i7.6626
work_keys_str_mv AT leejinhyuk evaluationofmultipleimputationwithlargeproportionsofmissingdatahowmuchistoomuch
AT huberjcharles evaluationofmultipleimputationwithlargeproportionsofmissingdatahowmuchistoomuch