Cargando…

Evaluating bias due to data linkage error in electronic healthcare records

BACKGROUND: Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different metho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Harron, Katie, Wade, Angie, Gilbert, Ruth, Muller-Pebody, Berit, Goldstein, Harvey
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015706/ https://www.ncbi.nlm.nih.gov/pubmed/24597489 http://dx.doi.org/10.1186/1471-2288-14-36

_version_	1782315383026352128
author	Harron, Katie Wade, Angie Gilbert, Ruth Muller-Pebody, Berit Goldstein, Harvey
author_facet	Harron, Katie Wade, Angie Gilbert, Ruth Muller-Pebody, Berit Goldstein, Harvey
author_sort	Harron, Katie
collection	PubMed
description	BACKGROUND: Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities. METHODS: A gold-standard dataset was created through deterministic linkage of unique identifiers in admission data from two hospitals and infection data recorded at the hospital laboratories (original data). Unique identifiers were then removed and data were re-linked by date of birth, sex and Soundex using two classification methods: i) HW classification - accepting the candidate record with the highest weight exceeding a threshold and ii) PII–imputing values from a match probability distribution. To evaluate methods for linking data with different error rates, non-random error and different match rates, we generated simulation data. Each set of simulated files was linked using both classification methods. Infection rates in the linked data were compared with those in the gold-standard data. RESULTS: In the original gold-standard data, 1496/20924 admissions linked to an infection. In the linked original data, PII provided least biased results: 1481 and 1457 infections (upper/lower thresholds) compared with 1316 and 1287 (HW upper/lower thresholds). In the simulated data, substantial bias (up to 112%) was introduced when linkage error varied by hospital. Bias was also greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches. CONCLUSIONS: This study highlights the importance of evaluating the potential impact of linkage error on results. PII can help incorporate linkage uncertainty into analysis and reduce bias due to linkage error, without requiring identifiers.
format	Online Article Text
id	pubmed-4015706
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40157062014-05-23 Evaluating bias due to data linkage error in electronic healthcare records Harron, Katie Wade, Angie Gilbert, Ruth Muller-Pebody, Berit Goldstein, Harvey BMC Med Res Methodol Research Article BACKGROUND: Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities. METHODS: A gold-standard dataset was created through deterministic linkage of unique identifiers in admission data from two hospitals and infection data recorded at the hospital laboratories (original data). Unique identifiers were then removed and data were re-linked by date of birth, sex and Soundex using two classification methods: i) HW classification - accepting the candidate record with the highest weight exceeding a threshold and ii) PII–imputing values from a match probability distribution. To evaluate methods for linking data with different error rates, non-random error and different match rates, we generated simulation data. Each set of simulated files was linked using both classification methods. Infection rates in the linked data were compared with those in the gold-standard data. RESULTS: In the original gold-standard data, 1496/20924 admissions linked to an infection. In the linked original data, PII provided least biased results: 1481 and 1457 infections (upper/lower thresholds) compared with 1316 and 1287 (HW upper/lower thresholds). In the simulated data, substantial bias (up to 112%) was introduced when linkage error varied by hospital. Bias was also greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches. CONCLUSIONS: This study highlights the importance of evaluating the potential impact of linkage error on results. PII can help incorporate linkage uncertainty into analysis and reduce bias due to linkage error, without requiring identifiers. BioMed Central 2014-03-05 /pmc/articles/PMC4015706/ /pubmed/24597489 http://dx.doi.org/10.1186/1471-2288-14-36 Text en Copyright © 2014 Harron et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Research Article Harron, Katie Wade, Angie Gilbert, Ruth Muller-Pebody, Berit Goldstein, Harvey Evaluating bias due to data linkage error in electronic healthcare records
title	Evaluating bias due to data linkage error in electronic healthcare records
title_full	Evaluating bias due to data linkage error in electronic healthcare records
title_fullStr	Evaluating bias due to data linkage error in electronic healthcare records
title_full_unstemmed	Evaluating bias due to data linkage error in electronic healthcare records
title_short	Evaluating bias due to data linkage error in electronic healthcare records
title_sort	evaluating bias due to data linkage error in electronic healthcare records
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015706/ https://www.ncbi.nlm.nih.gov/pubmed/24597489 http://dx.doi.org/10.1186/1471-2288-14-36
work_keys_str_mv	AT harronkatie evaluatingbiasduetodatalinkageerrorinelectronichealthcarerecords AT wadeangie evaluatingbiasduetodatalinkageerrorinelectronichealthcarerecords AT gilbertruth evaluatingbiasduetodatalinkageerrorinelectronichealthcarerecords AT mullerpebodyberit evaluatingbiasduetodatalinkageerrorinelectronichealthcarerecords AT goldsteinharvey evaluatingbiasduetodatalinkageerrorinelectronichealthcarerecords

Evaluating bias due to data linkage error in electronic healthcare records

Ejemplares similares