Cargando…

Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey

BACKGROUND: The advancement of information technology has immensely increased the quality and volume of health data. This has led to an increase in observational study, as well as to the threat of privacy invasion. Recently, a distributed research network based on the common data model (CDM) has eme...

Descripción completa

Detalles Bibliográficos
Autores principales: Tak, Yae Won, You, Seng Chan, Han, Jeong Hyun, Kim, Soon-Seok, Kim, Gi-Tae, Lee, Yura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Korean Academy of Medical Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9259248/
https://www.ncbi.nlm.nih.gov/pubmed/35790207
http://dx.doi.org/10.3346/jkms.2022.37.e205
_version_ 1784741733805850624
author Tak, Yae Won
You, Seng Chan
Han, Jeong Hyun
Kim, Soon-Seok
Kim, Gi-Tae
Lee, Yura
author_facet Tak, Yae Won
You, Seng Chan
Han, Jeong Hyun
Kim, Soon-Seok
Kim, Gi-Tae
Lee, Yura
author_sort Tak, Yae Won
collection PubMed
description BACKGROUND: The advancement of information technology has immensely increased the quality and volume of health data. This has led to an increase in observational study, as well as to the threat of privacy invasion. Recently, a distributed research network based on the common data model (CDM) has emerged, enabling collaborative international medical research without sharing patient-level data. Although the CDM database for each institution is built inside a firewall, the risk of re-identification requires management. Hence, this study aims to elucidate the perceptions CDM users have towards CDM and risk management for re-identification. METHODS: The survey, targeted to answer specific in-depth questions on CDM, was conducted from October to November 2020. We targeted well-experienced researchers who actively use CDM. Basic statistics (total number and percent) were computed for all covariates. RESULTS: There were 33 valid respondents. Of these, 43.8% suggested additional anonymization was unnecessary beyond, “minimum cell count” policy, which obscures a cell with a value lower than certain number (usually 5) in shared results to minimize the liability of re-identification due to rare conditions. During extract-transform-load processes, 81.8% of respondents assumed structured data is under control from the risk of re-identification. However, respondents noted that date of birth and death were highly re-identifiable information. The majority of respondents (n = 22, 66.7%) conceded the possibility of identifier-contained unstructured data in the NOTE table. CONCLUSION: Overall, CDM users generally attributed high reliability for privacy protection to the intrinsic nature of CDM. There was little demand for additional de-identification methods. However, unstructured data in the CDM were suspected to have risks. The necessity for a coordinating consortium to define and manage the re-identification risk of CDM was urged.
format Online
Article
Text
id pubmed-9259248
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Korean Academy of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-92592482022-07-18 Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey Tak, Yae Won You, Seng Chan Han, Jeong Hyun Kim, Soon-Seok Kim, Gi-Tae Lee, Yura J Korean Med Sci Original Article BACKGROUND: The advancement of information technology has immensely increased the quality and volume of health data. This has led to an increase in observational study, as well as to the threat of privacy invasion. Recently, a distributed research network based on the common data model (CDM) has emerged, enabling collaborative international medical research without sharing patient-level data. Although the CDM database for each institution is built inside a firewall, the risk of re-identification requires management. Hence, this study aims to elucidate the perceptions CDM users have towards CDM and risk management for re-identification. METHODS: The survey, targeted to answer specific in-depth questions on CDM, was conducted from October to November 2020. We targeted well-experienced researchers who actively use CDM. Basic statistics (total number and percent) were computed for all covariates. RESULTS: There were 33 valid respondents. Of these, 43.8% suggested additional anonymization was unnecessary beyond, “minimum cell count” policy, which obscures a cell with a value lower than certain number (usually 5) in shared results to minimize the liability of re-identification due to rare conditions. During extract-transform-load processes, 81.8% of respondents assumed structured data is under control from the risk of re-identification. However, respondents noted that date of birth and death were highly re-identifiable information. The majority of respondents (n = 22, 66.7%) conceded the possibility of identifier-contained unstructured data in the NOTE table. CONCLUSION: Overall, CDM users generally attributed high reliability for privacy protection to the intrinsic nature of CDM. There was little demand for additional de-identification methods. However, unstructured data in the CDM were suspected to have risks. The necessity for a coordinating consortium to define and manage the re-identification risk of CDM was urged. The Korean Academy of Medical Sciences 2022-06-20 /pmc/articles/PMC9259248/ /pubmed/35790207 http://dx.doi.org/10.3346/jkms.2022.37.e205 Text en © 2022 The Korean Academy of Medical Sciences. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Tak, Yae Won
You, Seng Chan
Han, Jeong Hyun
Kim, Soon-Seok
Kim, Gi-Tae
Lee, Yura
Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey
title Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey
title_full Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey
title_fullStr Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey
title_full_unstemmed Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey
title_short Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey
title_sort perceived risk of re-identification in omop-cdm database: a cross-sectional survey
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9259248/
https://www.ncbi.nlm.nih.gov/pubmed/35790207
http://dx.doi.org/10.3346/jkms.2022.37.e205
work_keys_str_mv AT takyaewon perceivedriskofreidentificationinomopcdmdatabaseacrosssectionalsurvey
AT yousengchan perceivedriskofreidentificationinomopcdmdatabaseacrosssectionalsurvey
AT hanjeonghyun perceivedriskofreidentificationinomopcdmdatabaseacrosssectionalsurvey
AT kimsoonseok perceivedriskofreidentificationinomopcdmdatabaseacrosssectionalsurvey
AT kimgitae perceivedriskofreidentificationinomopcdmdatabaseacrosssectionalsurvey
AT leeyura perceivedriskofreidentificationinomopcdmdatabaseacrosssectionalsurvey