Cargando…

Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network

OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then...

Descripción completa

Detalles Bibliográficos
Autores principales: Bian, Jiang, Loiacono, Alexander, Sura, Andrei, Mendoza Viramontes, Tonatiuh, Lipori, Gloria, Guo, Yi, Shenkman, Elizabeth, Hogan, William
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994009/
https://www.ncbi.nlm.nih.gov/pubmed/32025654
http://dx.doi.org/10.1093/jamiaopen/ooz050
_version_ 1783493135039463424
author Bian, Jiang
Loiacono, Alexander
Sura, Andrei
Mendoza Viramontes, Tonatiuh
Lipori, Gloria
Guo, Yi
Shenkman, Elizabeth
Hogan, William
author_facet Bian, Jiang
Loiacono, Alexander
Sura, Andrei
Mendoza Viramontes, Tonatiuh
Lipori, Gloria
Guo, Yi
Shenkman, Elizabeth
Hogan, William
author_sort Bian, Jiang
collection PubMed
description OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. RESULTS: We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. CONCLUSIONS: Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources.
format Online
Article
Text
id pubmed-6994009
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69940092020-02-05 Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network Bian, Jiang Loiacono, Alexander Sura, Andrei Mendoza Viramontes, Tonatiuh Lipori, Gloria Guo, Yi Shenkman, Elizabeth Hogan, William JAMIA Open Research and Applications OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. RESULTS: We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. CONCLUSIONS: Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources. Oxford University Press 2019-09-27 /pmc/articles/PMC6994009/ /pubmed/32025654 http://dx.doi.org/10.1093/jamiaopen/ooz050 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Bian, Jiang
Loiacono, Alexander
Sura, Andrei
Mendoza Viramontes, Tonatiuh
Lipori, Gloria
Guo, Yi
Shenkman, Elizabeth
Hogan, William
Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
title Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
title_full Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
title_fullStr Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
title_full_unstemmed Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
title_short Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
title_sort implementing a hash-based privacy-preserving record linkage tool in the oneflorida clinical research network
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994009/
https://www.ncbi.nlm.nih.gov/pubmed/32025654
http://dx.doi.org/10.1093/jamiaopen/ooz050
work_keys_str_mv AT bianjiang implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT loiaconoalexander implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT suraandrei implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT mendozaviramontestonatiuh implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT liporigloria implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT guoyi implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT shenkmanelizabeth implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork
AT hoganwilliam implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork