Cargando…

A guide to evaluating linkage quality for the analysis of linked data

Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked da...

Descripción completa

Detalles Bibliográficos
Autores principales: Harron, Katie L, Doidge, James C, Knight, Hannah E, Gilbert, Ruth E, Goldstein, Harvey, Cromwell, David A, van der Meulen, Jan H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5837697/
https://www.ncbi.nlm.nih.gov/pubmed/29025131
http://dx.doi.org/10.1093/ije/dyx177
_version_ 1783304133909938176
author Harron, Katie L
Doidge, James C
Knight, Hannah E
Gilbert, Ruth E
Goldstein, Harvey
Cromwell, David A
van der Meulen, Jan H
author_facet Harron, Katie L
Doidge, James C
Knight, Hannah E
Gilbert, Ruth E
Goldstein, Harvey
Cromwell, David A
van der Meulen, Jan H
author_sort Harron, Katie L
collection PubMed
description Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluated and provide widely applicable guidance for both data providers and researchers. Using an illustrative example of a linked dataset of maternal and baby hospital records, we demonstrate three approaches for evaluating linkage quality: applying the linkage algorithm to a subset of gold standard data to quantify linkage error; comparing characteristics of linked and unlinked data to identify potential sources of bias; and evaluating the sensitivity of results to changes in the linkage procedure. These approaches can inform our understanding of the potential impact of linkage error and provide an opportunity to select the most appropriate linkage procedure for a specific analysis. Evaluating linkage quality in this way will improve the quality and transparency of epidemiological and clinical research using linked data.
format Online
Article
Text
id pubmed-5837697
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58376972018-03-09 A guide to evaluating linkage quality for the analysis of linked data Harron, Katie L Doidge, James C Knight, Hannah E Gilbert, Ruth E Goldstein, Harvey Cromwell, David A van der Meulen, Jan H Int J Epidemiol Education Corner Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluated and provide widely applicable guidance for both data providers and researchers. Using an illustrative example of a linked dataset of maternal and baby hospital records, we demonstrate three approaches for evaluating linkage quality: applying the linkage algorithm to a subset of gold standard data to quantify linkage error; comparing characteristics of linked and unlinked data to identify potential sources of bias; and evaluating the sensitivity of results to changes in the linkage procedure. These approaches can inform our understanding of the potential impact of linkage error and provide an opportunity to select the most appropriate linkage procedure for a specific analysis. Evaluating linkage quality in this way will improve the quality and transparency of epidemiological and clinical research using linked data. Oxford University Press 2017-10 2017-09-07 /pmc/articles/PMC5837697/ /pubmed/29025131 http://dx.doi.org/10.1093/ije/dyx177 Text en © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Education Corner
Harron, Katie L
Doidge, James C
Knight, Hannah E
Gilbert, Ruth E
Goldstein, Harvey
Cromwell, David A
van der Meulen, Jan H
A guide to evaluating linkage quality for the analysis of linked data
title A guide to evaluating linkage quality for the analysis of linked data
title_full A guide to evaluating linkage quality for the analysis of linked data
title_fullStr A guide to evaluating linkage quality for the analysis of linked data
title_full_unstemmed A guide to evaluating linkage quality for the analysis of linked data
title_short A guide to evaluating linkage quality for the analysis of linked data
title_sort guide to evaluating linkage quality for the analysis of linked data
topic Education Corner
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5837697/
https://www.ncbi.nlm.nih.gov/pubmed/29025131
http://dx.doi.org/10.1093/ije/dyx177
work_keys_str_mv AT harronkatiel aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT doidgejamesc aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT knighthannahe aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT gilbertruthe aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT goldsteinharvey aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT cromwelldavida aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT vandermeulenjanh aguidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT harronkatiel guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT doidgejamesc guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT knighthannahe guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT gilbertruthe guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT goldsteinharvey guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT cromwelldavida guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata
AT vandermeulenjanh guidetoevaluatinglinkagequalityfortheanalysisoflinkeddata