Cargando…
Real-World Matching Performance of Deidentified Record-Linking Tokens
Objective Our objective was to evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions. Methods This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from the University o...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Georg Thieme Verlag KG
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9474266/ https://www.ncbi.nlm.nih.gov/pubmed/35896508 http://dx.doi.org/10.1055/a-1910-4154 |
_version_ | 1784789678773239808 |
---|---|
author | Bernstam, Elmer V. Applegate, Reuben Joseph Yu, Alvin Chaudhari, Deepa Liu, Tian Coda, Alex Leshin, Jonah |
author_facet | Bernstam, Elmer V. Applegate, Reuben Joseph Yu, Alvin Chaudhari, Deepa Liu, Tian Coda, Alex Leshin, Jonah |
author_sort | Bernstam, Elmer V. |
collection | PubMed |
description | Objective Our objective was to evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions. Methods This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from the University of Texas Houston's clinical data warehouse (CDW) in terms of entity resolution. Results The highest precision achieved was 99.9% with a token derived from the first name, last name, gender, and date-of-birth. The highest recall achieved was 95.5% with an algorithm involving tokens that reflected combinations of first name, last name, gender, date-of-birth, and social security number. Discussion To protect the privacy of patient data, information must be removed from a health care dataset to obscure the identity of individuals from which that data were derived. However, once identifying information is removed, records can no longer be linked to the same entity to enable analyses. Tokens are a mechanism to convert patient identifying information into Health Insurance Portability and Accountability Act-compliant deidentified elements that can be used to link clinical records, while preserving patient privacy. Conclusion Depending on the availability and accuracy of the underlying data, tokens are able to resolve and link entities at a high level of precision and recall for real-world data derived from a CDW. |
format | Online Article Text |
id | pubmed-9474266 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Georg Thieme Verlag KG |
record_format | MEDLINE/PubMed |
spelling | pubmed-94742662022-10-05 Real-World Matching Performance of Deidentified Record-Linking Tokens Bernstam, Elmer V. Applegate, Reuben Joseph Yu, Alvin Chaudhari, Deepa Liu, Tian Coda, Alex Leshin, Jonah Appl Clin Inform Objective Our objective was to evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions. Methods This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from the University of Texas Houston's clinical data warehouse (CDW) in terms of entity resolution. Results The highest precision achieved was 99.9% with a token derived from the first name, last name, gender, and date-of-birth. The highest recall achieved was 95.5% with an algorithm involving tokens that reflected combinations of first name, last name, gender, date-of-birth, and social security number. Discussion To protect the privacy of patient data, information must be removed from a health care dataset to obscure the identity of individuals from which that data were derived. However, once identifying information is removed, records can no longer be linked to the same entity to enable analyses. Tokens are a mechanism to convert patient identifying information into Health Insurance Portability and Accountability Act-compliant deidentified elements that can be used to link clinical records, while preserving patient privacy. Conclusion Depending on the availability and accuracy of the underlying data, tokens are able to resolve and link entities at a high level of precision and recall for real-world data derived from a CDW. Georg Thieme Verlag KG 2022-09-14 /pmc/articles/PMC9474266/ /pubmed/35896508 http://dx.doi.org/10.1055/a-1910-4154 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited. |
spellingShingle | Bernstam, Elmer V. Applegate, Reuben Joseph Yu, Alvin Chaudhari, Deepa Liu, Tian Coda, Alex Leshin, Jonah Real-World Matching Performance of Deidentified Record-Linking Tokens |
title | Real-World Matching Performance of Deidentified Record-Linking Tokens |
title_full | Real-World Matching Performance of Deidentified Record-Linking Tokens |
title_fullStr | Real-World Matching Performance of Deidentified Record-Linking Tokens |
title_full_unstemmed | Real-World Matching Performance of Deidentified Record-Linking Tokens |
title_short | Real-World Matching Performance of Deidentified Record-Linking Tokens |
title_sort | real-world matching performance of deidentified record-linking tokens |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9474266/ https://www.ncbi.nlm.nih.gov/pubmed/35896508 http://dx.doi.org/10.1055/a-1910-4154 |
work_keys_str_mv | AT bernstamelmerv realworldmatchingperformanceofdeidentifiedrecordlinkingtokens AT applegatereubenjoseph realworldmatchingperformanceofdeidentifiedrecordlinkingtokens AT yualvin realworldmatchingperformanceofdeidentifiedrecordlinkingtokens AT chaudharideepa realworldmatchingperformanceofdeidentifiedrecordlinkingtokens AT liutian realworldmatchingperformanceofdeidentifiedrecordlinkingtokens AT codaalex realworldmatchingperformanceofdeidentifiedrecordlinkingtokens AT leshinjonah realworldmatchingperformanceofdeidentifiedrecordlinkingtokens |