Cargando…

Medical record linkage in health information systems by approximate string matching and clustering

BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of informat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sauleau, Erik A, Paumier, Jean-Philippe, Buemi, Antoine
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274322/ https://www.ncbi.nlm.nih.gov/pubmed/16219102 http://dx.doi.org/10.1186/1472-6947-5-32

_version_	1782125992062484480
author	Sauleau, Erik A Paumier, Jean-Philippe Buemi, Antoine
author_facet	Sauleau, Erik A Paumier, Jean-Philippe Buemi, Antoine
author_sort	Sauleau, Erik A
collection	PubMed
description	BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity.
format	Text
id	pubmed-1274322
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-12743222005-10-29 Medical record linkage in health information systems by approximate string matching and clustering Sauleau, Erik A Paumier, Jean-Philippe Buemi, Antoine BMC Med Inform Decis Mak Research Article BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity. BioMed Central 2005-10-11 /pmc/articles/PMC1274322/ /pubmed/16219102 http://dx.doi.org/10.1186/1472-6947-5-32 Text en Copyright © 2005 Sauleau et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Sauleau, Erik A Paumier, Jean-Philippe Buemi, Antoine Medical record linkage in health information systems by approximate string matching and clustering
title	Medical record linkage in health information systems by approximate string matching and clustering
title_full	Medical record linkage in health information systems by approximate string matching and clustering
title_fullStr	Medical record linkage in health information systems by approximate string matching and clustering
title_full_unstemmed	Medical record linkage in health information systems by approximate string matching and clustering
title_short	Medical record linkage in health information systems by approximate string matching and clustering
title_sort	medical record linkage in health information systems by approximate string matching and clustering
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274322/ https://www.ncbi.nlm.nih.gov/pubmed/16219102 http://dx.doi.org/10.1186/1472-6947-5-32
work_keys_str_mv	AT sauleauerika medicalrecordlinkageinhealthinformationsystemsbyapproximatestringmatchingandclustering AT paumierjeanphilippe medicalrecordlinkageinhealthinformationsystemsbyapproximatestringmatchingandclustering AT buemiantoine medicalrecordlinkageinhealthinformationsystemsbyapproximatestringmatchingandclustering

Medical record linkage in health information systems by approximate string matching and clustering

Ejemplares similares