Cargando…

Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

BACKGROUND: Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare si...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shrive, Fiona M, Stuart, Heather, Quan, Hude, Ghali, William A
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1716168/ https://www.ncbi.nlm.nih.gov/pubmed/17166270 http://dx.doi.org/10.1186/1471-2288-6-57

_version_	1782131315842220032
author	Shrive, Fiona M Stuart, Heather Quan, Hude Ghali, William A
author_facet	Shrive, Fiona M Stuart, Heather Quan, Hude Ghali, William A
author_sort	Shrive, Fiona M
collection	PubMed
description	BACKGROUND: Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS). METHODS: 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation). Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1) multiple imputation, 2) single regression, 3) individual mean, 4) overall mean, 5) participant's preceding response, and 6) random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. RESULTS: When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89), although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range (0.76 and 0.74 respectively). CONCLUSION: Multiple imputation is the most accurate method for dealing with missing data in most of the missind data scenarios we assessed for the SDS. Imputing the individual's mean is also an appropriate and simple method for dealing with missing data that may be more interpretable to the majority of medical readers. Researchers should consider conducting methodological assessments such as this one when confronted with missing data. The optimal method should balance validity, ease of interpretability for readers, and analysis expertise of the research team.
format	Text
id	pubmed-1716168
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-17161682006-12-22 Dealing with missing data in a multi-question depression scale: a comparison of imputation methods Shrive, Fiona M Stuart, Heather Quan, Hude Ghali, William A BMC Med Res Methodol Research Article BACKGROUND: Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS). METHODS: 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation). Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1) multiple imputation, 2) single regression, 3) individual mean, 4) overall mean, 5) participant's preceding response, and 6) random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. RESULTS: When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89), although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range (0.76 and 0.74 respectively). CONCLUSION: Multiple imputation is the most accurate method for dealing with missing data in most of the missind data scenarios we assessed for the SDS. Imputing the individual's mean is also an appropriate and simple method for dealing with missing data that may be more interpretable to the majority of medical readers. Researchers should consider conducting methodological assessments such as this one when confronted with missing data. The optimal method should balance validity, ease of interpretability for readers, and analysis expertise of the research team. BioMed Central 2006-12-13 /pmc/articles/PMC1716168/ /pubmed/17166270 http://dx.doi.org/10.1186/1471-2288-6-57 Text en Copyright © 2006 Shrive et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Shrive, Fiona M Stuart, Heather Quan, Hude Ghali, William A Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
title	Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
title_full	Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
title_fullStr	Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
title_full_unstemmed	Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
title_short	Dealing with missing data in a multi-question depression scale: a comparison of imputation methods
title_sort	dealing with missing data in a multi-question depression scale: a comparison of imputation methods
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1716168/ https://www.ncbi.nlm.nih.gov/pubmed/17166270 http://dx.doi.org/10.1186/1471-2288-6-57
work_keys_str_mv	AT shrivefionam dealingwithmissingdatainamultiquestiondepressionscaleacomparisonofimputationmethods AT stuartheather dealingwithmissingdatainamultiquestiondepressionscaleacomparisonofimputationmethods AT quanhude dealingwithmissingdatainamultiquestiondepressionscaleacomparisonofimputationmethods AT ghaliwilliama dealingwithmissingdatainamultiquestiondepressionscaleacomparisonofimputationmethods

Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

Ejemplares similares