Cargando…

Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

BACKGROUND: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. OBJECTIVE: In this paper several statistical approaches to data “missingness” are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean...

Descripción completa

Detalles Bibliográficos
Autores principales:	Blankers, Matthijs, Koeter, Maarten W J, Schippers, Gerard M
Formato:	Texto
Lenguaje:	English
Publicado:	Gunther Eysenbach 2010
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3057309/ https://www.ncbi.nlm.nih.gov/pubmed/21169167 http://dx.doi.org/10.2196/jmir.1448

_version_	1782200279538597888
author	Blankers, Matthijs Koeter, Maarten W J Schippers, Gerard M
author_facet	Blankers, Matthijs Koeter, Maarten W J Schippers, Gerard M
author_sort	Blankers, Matthijs
collection	PubMed
description	BACKGROUND: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. OBJECTIVE: In this paper several statistical approaches to data “missingness” are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. METHODS: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. RESULTS: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen’s d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). CONCLUSIONS: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
format	Text
id	pubmed-3057309
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Gunther Eysenbach
record_format	MEDLINE/PubMed
spelling	pubmed-30573092011-03-15 Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers Blankers, Matthijs Koeter, Maarten W J Schippers, Gerard M J Med Internet Res Original Paper BACKGROUND: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. OBJECTIVE: In this paper several statistical approaches to data “missingness” are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. METHODS: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. RESULTS: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen’s d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). CONCLUSIONS: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers. Gunther Eysenbach 2010-12-19 /pmc/articles/PMC3057309/ /pubmed/21169167 http://dx.doi.org/10.2196/jmir.1448 Text en ©Matthijs Blankers, Maarten W J Koeter, Gerard M Schippers. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 19.12.2010 http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Blankers, Matthijs Koeter, Maarten W J Schippers, Gerard M Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
title	Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
title_full	Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
title_fullStr	Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
title_full_unstemmed	Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
title_short	Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers
title_sort	missing data approaches in ehealth research: simulation study and a tutorial for nonmathematically inclined researchers
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3057309/ https://www.ncbi.nlm.nih.gov/pubmed/21169167 http://dx.doi.org/10.2196/jmir.1448
work_keys_str_mv	AT blankersmatthijs missingdataapproachesinehealthresearchsimulationstudyandatutorialfornonmathematicallyinclinedresearchers AT koetermaartenwj missingdataapproachesinehealthresearchsimulationstudyandatutorialfornonmathematicallyinclinedresearchers AT schippersgerardm missingdataapproachesinehealthresearchsimulationstudyandatutorialfornonmathematicallyinclinedresearchers

Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

Ejemplares similares