Cargando…
Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994009/ https://www.ncbi.nlm.nih.gov/pubmed/32025654 http://dx.doi.org/10.1093/jamiaopen/ooz050 |
_version_ | 1783493135039463424 |
---|---|
author | Bian, Jiang Loiacono, Alexander Sura, Andrei Mendoza Viramontes, Tonatiuh Lipori, Gloria Guo, Yi Shenkman, Elizabeth Hogan, William |
author_facet | Bian, Jiang Loiacono, Alexander Sura, Andrei Mendoza Viramontes, Tonatiuh Lipori, Gloria Guo, Yi Shenkman, Elizabeth Hogan, William |
author_sort | Bian, Jiang |
collection | PubMed |
description | OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. RESULTS: We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. CONCLUSIONS: Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources. |
format | Online Article Text |
id | pubmed-6994009 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69940092020-02-05 Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network Bian, Jiang Loiacono, Alexander Sura, Andrei Mendoza Viramontes, Tonatiuh Lipori, Gloria Guo, Yi Shenkman, Elizabeth Hogan, William JAMIA Open Research and Applications OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. RESULTS: We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. CONCLUSIONS: Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources. Oxford University Press 2019-09-27 /pmc/articles/PMC6994009/ /pubmed/32025654 http://dx.doi.org/10.1093/jamiaopen/ooz050 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Bian, Jiang Loiacono, Alexander Sura, Andrei Mendoza Viramontes, Tonatiuh Lipori, Gloria Guo, Yi Shenkman, Elizabeth Hogan, William Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
title | Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
title_full | Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
title_fullStr | Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
title_full_unstemmed | Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
title_short | Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
title_sort | implementing a hash-based privacy-preserving record linkage tool in the oneflorida clinical research network |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994009/ https://www.ncbi.nlm.nih.gov/pubmed/32025654 http://dx.doi.org/10.1093/jamiaopen/ooz050 |
work_keys_str_mv | AT bianjiang implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT loiaconoalexander implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT suraandrei implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT mendozaviramontestonatiuh implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT liporigloria implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT guoyi implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT shenkmanelizabeth implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork AT hoganwilliam implementingahashbasedprivacypreservingrecordlinkagetoolintheonefloridaclinicalresearchnetwork |