Cargando…

Comparing record linkage software programs and algorithms using real-world data

Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Karr, Alan F., Taylor, Matthew T., West, Suzanne L., Setoguchi, Soko, Kou, Tzuyung D., Gerhard, Tobias, Horton, Daniel B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6759179/ https://www.ncbi.nlm.nih.gov/pubmed/31550255 http://dx.doi.org/10.1371/journal.pone.0221459

_version_	1783453652529184768
author	Karr, Alan F. Taylor, Matthew T. West, Suzanne L. Setoguchi, Soko Kou, Tzuyung D. Gerhard, Tobias Horton, Daniel B.
author_facet	Karr, Alan F. Taylor, Matthew T. West, Suzanne L. Setoguchi, Soko Kou, Tzuyung D. Gerhard, Tobias Horton, Daniel B.
author_sort	Karr, Alan F.
collection	PubMed
description	Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods for string matching and weight determination, and decision rules, we compared the performance of 4 nonproprietary linkage software packages linking patient identifiers from noninteroperable inpatient and outpatient EHRs. We linked datasets using first and last name, gender, and date of birth (DOB). We evaluated DOB and year of birth (YOB) as blocking variables and used exact and inexact matching methods. We compared the weights assigned to record pairs and evaluated how matching weights corresponded to a gold standard, medical record number. Deduplicated datasets contained 69,523 inpatient and 176,154 outpatient records, respectively. Linkage runs blocking on DOB produced weights ranging in number from 8 for exact matching to 64,273 for inexact matching. Linkage runs blocking on YOB produced 8 to 916,806 weights. Exact matching matched record pairs with identical test characteristics (sensitivity 90.48%, specificity 99.78%) for the highest ranked group, but algorithms differentially prioritized certain variables. Inexact matching behaved more variably, leading to dramatic differences in sensitivity (range 0.04–93.36%) and positive predictive value (PPV) (range 86.67–97.35%), even for the most highly ranked record pairs. Blocking on DOB led to higher PPV of highly ranked record pairs. An ensemble approach based on averaging scaled matching weights led to modestly improved accuracy. In summary, we found few differences in the rankings of record pairs with the highest matching weights across 4 linkage packages. Performance was more consistent for exact string matching than for inexact string matching. Most methods and software packages performed similarly when comparing matching accuracy with the gold standard. In some settings, an ensemble matching approach may outperform individual linkage algorithms.
format	Online Article Text
id	pubmed-6759179
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-67591792019-10-04 Comparing record linkage software programs and algorithms using real-world data Karr, Alan F. Taylor, Matthew T. West, Suzanne L. Setoguchi, Soko Kou, Tzuyung D. Gerhard, Tobias Horton, Daniel B. PLoS One Research Article Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods for string matching and weight determination, and decision rules, we compared the performance of 4 nonproprietary linkage software packages linking patient identifiers from noninteroperable inpatient and outpatient EHRs. We linked datasets using first and last name, gender, and date of birth (DOB). We evaluated DOB and year of birth (YOB) as blocking variables and used exact and inexact matching methods. We compared the weights assigned to record pairs and evaluated how matching weights corresponded to a gold standard, medical record number. Deduplicated datasets contained 69,523 inpatient and 176,154 outpatient records, respectively. Linkage runs blocking on DOB produced weights ranging in number from 8 for exact matching to 64,273 for inexact matching. Linkage runs blocking on YOB produced 8 to 916,806 weights. Exact matching matched record pairs with identical test characteristics (sensitivity 90.48%, specificity 99.78%) for the highest ranked group, but algorithms differentially prioritized certain variables. Inexact matching behaved more variably, leading to dramatic differences in sensitivity (range 0.04–93.36%) and positive predictive value (PPV) (range 86.67–97.35%), even for the most highly ranked record pairs. Blocking on DOB led to higher PPV of highly ranked record pairs. An ensemble approach based on averaging scaled matching weights led to modestly improved accuracy. In summary, we found few differences in the rankings of record pairs with the highest matching weights across 4 linkage packages. Performance was more consistent for exact string matching than for inexact string matching. Most methods and software packages performed similarly when comparing matching accuracy with the gold standard. In some settings, an ensemble matching approach may outperform individual linkage algorithms. Public Library of Science 2019-09-24 /pmc/articles/PMC6759179/ /pubmed/31550255 http://dx.doi.org/10.1371/journal.pone.0221459 Text en © 2019 Karr et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Karr, Alan F. Taylor, Matthew T. West, Suzanne L. Setoguchi, Soko Kou, Tzuyung D. Gerhard, Tobias Horton, Daniel B. Comparing record linkage software programs and algorithms using real-world data
title	Comparing record linkage software programs and algorithms using real-world data
title_full	Comparing record linkage software programs and algorithms using real-world data
title_fullStr	Comparing record linkage software programs and algorithms using real-world data
title_full_unstemmed	Comparing record linkage software programs and algorithms using real-world data
title_short	Comparing record linkage software programs and algorithms using real-world data
title_sort	comparing record linkage software programs and algorithms using real-world data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6759179/ https://www.ncbi.nlm.nih.gov/pubmed/31550255 http://dx.doi.org/10.1371/journal.pone.0221459
work_keys_str_mv	AT karralanf comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata AT taylormatthewt comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata AT westsuzannel comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata AT setoguchisoko comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata AT koutzuyungd comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata AT gerhardtobias comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata AT hortondanielb comparingrecordlinkagesoftwareprogramsandalgorithmsusingrealworlddata

Comparing record linkage software programs and algorithms using real-world data

Ejemplares similares