Cargando…

Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study

BACKGROUND: Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many t...

Descripción completa

Detalles Bibliográficos
Autores principales: Avoundjian, Tigran, Dombrowski, Julia C, Golden, Matthew R, Hughes, James P, Guthrie, Brandon L, Baseman, Janet, Sadinle, Mauricio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7226047/
https://www.ncbi.nlm.nih.gov/pubmed/32352389
http://dx.doi.org/10.2196/15917
_version_ 1783534196152598528
author Avoundjian, Tigran
Dombrowski, Julia C
Golden, Matthew R
Hughes, James P
Guthrie, Brandon L
Baseman, Janet
Sadinle, Mauricio
author_facet Avoundjian, Tigran
Dombrowski, Julia C
Golden, Matthew R
Hughes, James P
Guthrie, Brandon L
Baseman, Janet
Sadinle, Mauricio
author_sort Avoundjian, Tigran
collection PubMed
description BACKGROUND: Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. OBJECTIVE: This study aimed to compare the performance of record linkage algorithms commonly used in public health practice. METHODS: We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. RESULTS: In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). CONCLUSIONS: Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.
format Online
Article
Text
id pubmed-7226047
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72260472020-05-19 Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study Avoundjian, Tigran Dombrowski, Julia C Golden, Matthew R Hughes, James P Guthrie, Brandon L Baseman, Janet Sadinle, Mauricio JMIR Public Health Surveill Original Paper BACKGROUND: Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. OBJECTIVE: This study aimed to compare the performance of record linkage algorithms commonly used in public health practice. METHODS: We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. RESULTS: In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). CONCLUSIONS: Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action. JMIR Publications 2020-04-30 /pmc/articles/PMC7226047/ /pubmed/32352389 http://dx.doi.org/10.2196/15917 Text en ©Tigran Avoundjian, Julia C Dombrowski, Matthew R Golden, James P Hughes, Brandon L Guthrie, Janet Baseman, Mauricio Sadinle. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 30.04.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Avoundjian, Tigran
Dombrowski, Julia C
Golden, Matthew R
Hughes, James P
Guthrie, Brandon L
Baseman, Janet
Sadinle, Mauricio
Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_full Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_fullStr Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_full_unstemmed Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_short Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_sort comparing methods for record linkage for public health action: matching algorithm validation study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7226047/
https://www.ncbi.nlm.nih.gov/pubmed/32352389
http://dx.doi.org/10.2196/15917
work_keys_str_mv AT avoundjiantigran comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
AT dombrowskijuliac comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
AT goldenmatthewr comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
AT hughesjamesp comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
AT guthriebrandonl comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
AT basemanjanet comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
AT sadinlemauricio comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy