Cargando…

Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data

BACKGROUND: The pseudonymisation algorithm used to link together episodes of care belonging to the same patient in England [Hospital Episode Statistics ID (HESID)] has never undergone any formal evaluation to determine the extent of data linkage error. OBJECTIVE: To quantify improvements in linkage...

Descripción completa

Detalles Bibliográficos
Autores principales: Hagger-Johnson, Gareth, Harron, Katie, Goldstein, Harvey, Aldridge, Rob, Gilbert, Ruth
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6217911/
https://www.ncbi.nlm.nih.gov/pubmed/28749318
http://dx.doi.org/10.14236/jhi.v24i2.891
_version_ 1783368374549479424
author Hagger-Johnson, Gareth
Harron, Katie
Goldstein, Harvey
Aldridge, Rob
Gilbert, Ruth
author_facet Hagger-Johnson, Gareth
Harron, Katie
Goldstein, Harvey
Aldridge, Rob
Gilbert, Ruth
author_sort Hagger-Johnson, Gareth
collection PubMed
description BACKGROUND: The pseudonymisation algorithm used to link together episodes of care belonging to the same patient in England [Hospital Episode Statistics ID (HESID)] has never undergone any formal evaluation to determine the extent of data linkage error. OBJECTIVE: To quantify improvements in linkage accuracy from adding probabilistic linkage to existing deterministic HESID algorithms. METHODS: Inpatient admissions to National Health Service (NHS) hospitals in England (HES) over 17 years (1998 to 2015) for a sample of patients (born 13th or 28th of months in 1992/1998/2005/2012). We compared the existing deterministic algorithm with one that included an additional probabilistic step, in relation to a reference standard created using enhanced probabilistic matching with additional clinical and demographic information. Missed and false matches were quantified and the impact on estimates of hospital readmission within one year was determined. RESULTS: HESID produced a high missed match rate, improving over time (8.6% in 1998 to 0.4% in 2015). Missed matches were more common for ethnic minorities, those living in areas of high socio-economic deprivation, foreign patients and those with ‘no fixed abode’. Estimates of the readmission rate were biased for several patient groups owing to missed matches, which were reduced for nearly all groups. CONCLUSION: Probabilistic linkage of HES reduced missed matches and bias in estimated readmission rates, with clear implications for commissioning, service evaluation and performance monitoring of hospitals. The existing algorithm should be modified to address data linkage error, and a retrospective update of the existing data would address existing linkage errors and their implications.
format Online
Article
Text
id pubmed-6217911
institution National Center for Biotechnology Information
language English
publishDate 2017
record_format MEDLINE/PubMed
spelling pubmed-62179112018-11-05 Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data Hagger-Johnson, Gareth Harron, Katie Goldstein, Harvey Aldridge, Rob Gilbert, Ruth J Innov Health Inform Article BACKGROUND: The pseudonymisation algorithm used to link together episodes of care belonging to the same patient in England [Hospital Episode Statistics ID (HESID)] has never undergone any formal evaluation to determine the extent of data linkage error. OBJECTIVE: To quantify improvements in linkage accuracy from adding probabilistic linkage to existing deterministic HESID algorithms. METHODS: Inpatient admissions to National Health Service (NHS) hospitals in England (HES) over 17 years (1998 to 2015) for a sample of patients (born 13th or 28th of months in 1992/1998/2005/2012). We compared the existing deterministic algorithm with one that included an additional probabilistic step, in relation to a reference standard created using enhanced probabilistic matching with additional clinical and demographic information. Missed and false matches were quantified and the impact on estimates of hospital readmission within one year was determined. RESULTS: HESID produced a high missed match rate, improving over time (8.6% in 1998 to 0.4% in 2015). Missed matches were more common for ethnic minorities, those living in areas of high socio-economic deprivation, foreign patients and those with ‘no fixed abode’. Estimates of the readmission rate were biased for several patient groups owing to missed matches, which were reduced for nearly all groups. CONCLUSION: Probabilistic linkage of HES reduced missed matches and bias in estimated readmission rates, with clear implications for commissioning, service evaluation and performance monitoring of hospitals. The existing algorithm should be modified to address data linkage error, and a retrospective update of the existing data would address existing linkage errors and their implications. 2017-06-30 /pmc/articles/PMC6217911/ /pubmed/28749318 http://dx.doi.org/10.14236/jhi.v24i2.891 Text en http://creativecommons.org/licenses/by/4.0/ Published by BCS, The Chartered Institute for IT under Creative Commons license http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Hagger-Johnson, Gareth
Harron, Katie
Goldstein, Harvey
Aldridge, Rob
Gilbert, Ruth
Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
title Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
title_full Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
title_fullStr Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
title_full_unstemmed Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
title_short Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
title_sort probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6217911/
https://www.ncbi.nlm.nih.gov/pubmed/28749318
http://dx.doi.org/10.14236/jhi.v24i2.891
work_keys_str_mv AT haggerjohnsongareth probabilisticlinkingtoenhancedeterministicalgorithmsandreducelinkageerrorsinhospitaladministrativedata
AT harronkatie probabilisticlinkingtoenhancedeterministicalgorithmsandreducelinkageerrorsinhospitaladministrativedata
AT goldsteinharvey probabilisticlinkingtoenhancedeterministicalgorithmsandreducelinkageerrorsinhospitaladministrativedata
AT aldridgerob probabilisticlinkingtoenhancedeterministicalgorithmsandreducelinkageerrorsinhospitaladministrativedata
AT gilbertruth probabilisticlinkingtoenhancedeterministicalgorithmsandreducelinkageerrorsinhospitaladministrativedata