Cargando…

An efficient record linkage scheme using graphical analysis for identifier error detection

BACKGROUND: Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretica...

Descripción completa

Detalles Bibliográficos
Autores principales:	Finney, John M, Walker, A Sarah, Peto, Tim EA, Wyllie, David H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3039555/ https://www.ncbi.nlm.nih.gov/pubmed/21284874 http://dx.doi.org/10.1186/1472-6947-11-7

_version_	1782198193963925504
author	Finney, John M Walker, A Sarah Peto, Tim EA Wyllie, David H
author_facet	Finney, John M Walker, A Sarah Peto, Tim EA Wyllie, David H
author_sort	Finney, John M
collection	PubMed
description	BACKGROUND: Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretically unique identifiers, such as NHS numbers, which are both incomplete and error-prone. METHODS: We describe a two-step record linkage algorithm in which identifiers with high cardinality are identified or generated, and used to perform an initial exact match based linkage. Subsequently, the resulting clusters are studied and, if appropriate, partitioned using a graph based algorithm detecting erroneous identifiers. RESULTS: The system was used to cluster over 250 million health records from five data sources within a large UK hospital group. Linkage, which was completed in about 30 minutes, yielded 3.6 million clusters of which about 99.8% contain, with high likelihood, records from one patient. Although computationally efficient, the algorithm's requirement for exact matching of at least one identifier of each record to another for cluster formation may be a limitation in some databases containing records of low identifier quality. CONCLUSIONS: The technique described offers a simple, fast and highly efficient two-step method for large scale initial linkage for records commonly found in the UK's National Health Service.
format	Text
id	pubmed-3039555
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30395552011-02-16 An efficient record linkage scheme using graphical analysis for identifier error detection Finney, John M Walker, A Sarah Peto, Tim EA Wyllie, David H BMC Med Inform Decis Mak Research Article BACKGROUND: Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretically unique identifiers, such as NHS numbers, which are both incomplete and error-prone. METHODS: We describe a two-step record linkage algorithm in which identifiers with high cardinality are identified or generated, and used to perform an initial exact match based linkage. Subsequently, the resulting clusters are studied and, if appropriate, partitioned using a graph based algorithm detecting erroneous identifiers. RESULTS: The system was used to cluster over 250 million health records from five data sources within a large UK hospital group. Linkage, which was completed in about 30 minutes, yielded 3.6 million clusters of which about 99.8% contain, with high likelihood, records from one patient. Although computationally efficient, the algorithm's requirement for exact matching of at least one identifier of each record to another for cluster formation may be a limitation in some databases containing records of low identifier quality. CONCLUSIONS: The technique described offers a simple, fast and highly efficient two-step method for large scale initial linkage for records commonly found in the UK's National Health Service. BioMed Central 2011-02-01 /pmc/articles/PMC3039555/ /pubmed/21284874 http://dx.doi.org/10.1186/1472-6947-11-7 Text en Copyright ©2011 Finney et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Finney, John M Walker, A Sarah Peto, Tim EA Wyllie, David H An efficient record linkage scheme using graphical analysis for identifier error detection
title	An efficient record linkage scheme using graphical analysis for identifier error detection
title_full	An efficient record linkage scheme using graphical analysis for identifier error detection
title_fullStr	An efficient record linkage scheme using graphical analysis for identifier error detection
title_full_unstemmed	An efficient record linkage scheme using graphical analysis for identifier error detection
title_short	An efficient record linkage scheme using graphical analysis for identifier error detection
title_sort	efficient record linkage scheme using graphical analysis for identifier error detection
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3039555/ https://www.ncbi.nlm.nih.gov/pubmed/21284874 http://dx.doi.org/10.1186/1472-6947-11-7
work_keys_str_mv	AT finneyjohnm anefficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT walkerasarah anefficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT petotimea anefficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT wylliedavidh anefficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT finneyjohnm efficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT walkerasarah efficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT petotimea efficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection AT wylliedavidh efficientrecordlinkageschemeusinggraphicalanalysisforidentifiererrordetection

An efficient record linkage scheme using graphical analysis for identifier error detection

Ejemplares similares