Cargando…

Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records

OBJECTIVES: Our aim was to estimate the rate of data linkage error in Hospital Episode Statistics (HES) by testing the HESID pseudoanonymisation algorithm against a reference standard, in a national registry of paediatric intensive care records. SETTING: The Paediatric Intensive Care Audit Network (...

Descripción completa

Detalles Bibliográficos
Autores principales: Hagger-Johnson, Gareth, Harron, Katie, Fleming, Tom, Gilbert, Ruth, Goldstein, Harvey, Landy, Rebecca, Parslow, Roger C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4550723/
https://www.ncbi.nlm.nih.gov/pubmed/26297363
http://dx.doi.org/10.1136/bmjopen-2015-008118
_version_ 1782387486496915456
author Hagger-Johnson, Gareth
Harron, Katie
Fleming, Tom
Gilbert, Ruth
Goldstein, Harvey
Landy, Rebecca
Parslow, Roger C
author_facet Hagger-Johnson, Gareth
Harron, Katie
Fleming, Tom
Gilbert, Ruth
Goldstein, Harvey
Landy, Rebecca
Parslow, Roger C
author_sort Hagger-Johnson, Gareth
collection PubMed
description OBJECTIVES: Our aim was to estimate the rate of data linkage error in Hospital Episode Statistics (HES) by testing the HESID pseudoanonymisation algorithm against a reference standard, in a national registry of paediatric intensive care records. SETTING: The Paediatric Intensive Care Audit Network (PICANet) database, covering 33 paediatric intensive care units in England, Scotland and Wales. PARTICIPANTS: Data from infants and young people aged 0–19 years admitted between 1 January 2004 and 21 February 2014. PRIMARY AND SECONDARY OUTCOME MEASURES: PICANet admission records were classified as matches (records belonging to the same patient who had been readmitted) or non-matches (records belonging to different patients) after applying the HESID algorithm to PICANet records. False-match and missed-match rates were calculated by comparing results of the HESID algorithm with the reference standard PICANet ID. The effect of linkage errors on readmission rate was evaluated. RESULTS: Of 166 406 admissions, 88 596 were true matches (where the same patient had been readmitted). The HESID pseudonymisation algorithm produced few false matches (n=176/77 810; 0.2%) but a larger proportion of missed matches (n=3609/88 596; 4.1%). The true readmission rate was underestimated by 3.8% due to linkage errors. Patients who were younger, male, from Asian/Black/Other ethnic groups (vs White) were more likely to experience a false match. Missed matches were more common for younger patients, for Asian/Black/Other ethnic groups (vs White) and for patients whose records had missing data. CONCLUSIONS: The deterministic algorithm used to link all episodes of hospital care for the same patient in England has a high missed match rate which underestimates the true readmission rate and will produce biased analyses. To reduce linkage error, pseudoanonymisation algorithms need to be validated against good quality reference standards. Pseudonymisation of data ‘at source’ does not itself address errors in patient identifiers and the impact these errors have on data linkage.
format Online
Article
Text
id pubmed-4550723
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-45507232015-08-31 Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records Hagger-Johnson, Gareth Harron, Katie Fleming, Tom Gilbert, Ruth Goldstein, Harvey Landy, Rebecca Parslow, Roger C BMJ Open Research Methods OBJECTIVES: Our aim was to estimate the rate of data linkage error in Hospital Episode Statistics (HES) by testing the HESID pseudoanonymisation algorithm against a reference standard, in a national registry of paediatric intensive care records. SETTING: The Paediatric Intensive Care Audit Network (PICANet) database, covering 33 paediatric intensive care units in England, Scotland and Wales. PARTICIPANTS: Data from infants and young people aged 0–19 years admitted between 1 January 2004 and 21 February 2014. PRIMARY AND SECONDARY OUTCOME MEASURES: PICANet admission records were classified as matches (records belonging to the same patient who had been readmitted) or non-matches (records belonging to different patients) after applying the HESID algorithm to PICANet records. False-match and missed-match rates were calculated by comparing results of the HESID algorithm with the reference standard PICANet ID. The effect of linkage errors on readmission rate was evaluated. RESULTS: Of 166 406 admissions, 88 596 were true matches (where the same patient had been readmitted). The HESID pseudonymisation algorithm produced few false matches (n=176/77 810; 0.2%) but a larger proportion of missed matches (n=3609/88 596; 4.1%). The true readmission rate was underestimated by 3.8% due to linkage errors. Patients who were younger, male, from Asian/Black/Other ethnic groups (vs White) were more likely to experience a false match. Missed matches were more common for younger patients, for Asian/Black/Other ethnic groups (vs White) and for patients whose records had missing data. CONCLUSIONS: The deterministic algorithm used to link all episodes of hospital care for the same patient in England has a high missed match rate which underestimates the true readmission rate and will produce biased analyses. To reduce linkage error, pseudoanonymisation algorithms need to be validated against good quality reference standards. Pseudonymisation of data ‘at source’ does not itself address errors in patient identifiers and the impact these errors have on data linkage. BMJ Publishing Group 2015-08-21 /pmc/articles/PMC4550723/ /pubmed/26297363 http://dx.doi.org/10.1136/bmjopen-2015-008118 Text en Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/
spellingShingle Research Methods
Hagger-Johnson, Gareth
Harron, Katie
Fleming, Tom
Gilbert, Ruth
Goldstein, Harvey
Landy, Rebecca
Parslow, Roger C
Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
title Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
title_full Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
title_fullStr Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
title_full_unstemmed Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
title_short Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
title_sort data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records
topic Research Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4550723/
https://www.ncbi.nlm.nih.gov/pubmed/26297363
http://dx.doi.org/10.1136/bmjopen-2015-008118
work_keys_str_mv AT haggerjohnsongareth datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords
AT harronkatie datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords
AT flemingtom datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords
AT gilbertruth datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords
AT goldsteinharvey datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords
AT landyrebecca datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords
AT parslowrogerc datalinkageerrorsinhospitaladministrativedatawhenapplyingapseudonymisationalgorithmtopaediatricintensivecarerecords