Cargando…

Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients

BACKGROUND: Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option—but is this always tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaambwa, Billingsley, Bryan, Stirling, Billingham, Lucinda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441253/
https://www.ncbi.nlm.nih.gov/pubmed/22738344
http://dx.doi.org/10.1186/1756-0500-5-330
_version_ 1782243247112847360
author Kaambwa, Billingsley
Bryan, Stirling
Billingham, Lucinda
author_facet Kaambwa, Billingsley
Bryan, Stirling
Billingham, Lucinda
author_sort Kaambwa, Billingsley
collection PubMed
description BACKGROUND: Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option—but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors. RESULTS: Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively. CONCLUSIONS: Arbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate.
format Online
Article
Text
id pubmed-3441253
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34412532012-09-18 Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients Kaambwa, Billingsley Bryan, Stirling Billingham, Lucinda BMC Res Notes Research Article BACKGROUND: Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option—but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors. RESULTS: Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively. CONCLUSIONS: Arbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate. BioMed Central 2012-06-27 /pmc/articles/PMC3441253/ /pubmed/22738344 http://dx.doi.org/10.1186/1756-0500-5-330 Text en Copyright ©2012 Kaambwa et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kaambwa, Billingsley
Bryan, Stirling
Billingham, Lucinda
Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
title Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
title_full Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
title_fullStr Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
title_full_unstemmed Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
title_short Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
title_sort do the methods used to analyse missing data really matter? an examination of data from an observational study of intermediate care patients
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441253/
https://www.ncbi.nlm.nih.gov/pubmed/22738344
http://dx.doi.org/10.1186/1756-0500-5-330
work_keys_str_mv AT kaambwabillingsley dothemethodsusedtoanalysemissingdatareallymatteranexaminationofdatafromanobservationalstudyofintermediatecarepatients
AT bryanstirling dothemethodsusedtoanalysemissingdatareallymatteranexaminationofdatafromanobservationalstudyofintermediatecarepatients
AT billinghamlucinda dothemethodsusedtoanalysemissingdatareallymatteranexaminationofdatafromanobservationalstudyofintermediatecarepatients