Cargando…

Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records

BACKGROUND: Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health info...

Descripción completa

Detalles Bibliográficos
Autores principales: Simon, Gregory E., Shortreed, Susan M., Coley, R. Yates, Penfold, Robert B., Rossom, Rebecca C., Waitzfelder, Beth E., Sanchez, Katherine, Lynch, Frances L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Ubiquity Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450246/
https://www.ncbi.nlm.nih.gov/pubmed/30972355
http://dx.doi.org/10.5334/egems.270
_version_ 1783409003174297600
author Simon, Gregory E.
Shortreed, Susan M.
Coley, R. Yates
Penfold, Robert B.
Rossom, Rebecca C.
Waitzfelder, Beth E.
Sanchez, Katherine
Lynch, Frances L.
author_facet Simon, Gregory E.
Shortreed, Susan M.
Coley, R. Yates
Penfold, Robert B.
Rossom, Rebecca C.
Waitzfelder, Beth E.
Sanchez, Katherine
Lynch, Frances L.
author_sort Simon, Gregory E.
collection PubMed
description BACKGROUND: Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health information. METHOD: We describe a framework for assessing re-identification risk that includes: identifying data elements in a research dataset that overlap with external data sources, identifying small classes of records defined by unique combinations of those data elements, and considering the pattern of population overlap between the research dataset and an external source. We also describe alternative strategies for mitigating risk when the external data source can or cannot be directly examined. RESULTS: We illustrate this framework using the example of a large database used to develop and validate models predicting suicidal behavior after an outpatient visit. We identify elements in the research dataset that might create risk and propose a specific risk mitigation strategy: deleting indicators for health system (a proxy for state of residence) and visit year. DISCUSSION: Researchers holding health system data must balance the public health value of data sharing against the duty to protect the privacy of health system members. Specific steps can provide a useful estimate of re-identification risk and point to effective risk mitigation strategies.
format Online
Article
Text
id pubmed-6450246
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Ubiquity Press
record_format MEDLINE/PubMed
spelling pubmed-64502462019-04-10 Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records Simon, Gregory E. Shortreed, Susan M. Coley, R. Yates Penfold, Robert B. Rossom, Rebecca C. Waitzfelder, Beth E. Sanchez, Katherine Lynch, Frances L. EGEMS (Wash DC) Model/Framework BACKGROUND: Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health information. METHOD: We describe a framework for assessing re-identification risk that includes: identifying data elements in a research dataset that overlap with external data sources, identifying small classes of records defined by unique combinations of those data elements, and considering the pattern of population overlap between the research dataset and an external source. We also describe alternative strategies for mitigating risk when the external data source can or cannot be directly examined. RESULTS: We illustrate this framework using the example of a large database used to develop and validate models predicting suicidal behavior after an outpatient visit. We identify elements in the research dataset that might create risk and propose a specific risk mitigation strategy: deleting indicators for health system (a proxy for state of residence) and visit year. DISCUSSION: Researchers holding health system data must balance the public health value of data sharing against the duty to protect the privacy of health system members. Specific steps can provide a useful estimate of re-identification risk and point to effective risk mitigation strategies. Ubiquity Press 2019-03-29 /pmc/articles/PMC6450246/ /pubmed/30972355 http://dx.doi.org/10.5334/egems.270 Text en Copyright: © 2019 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
spellingShingle Model/Framework
Simon, Gregory E.
Shortreed, Susan M.
Coley, R. Yates
Penfold, Robert B.
Rossom, Rebecca C.
Waitzfelder, Beth E.
Sanchez, Katherine
Lynch, Frances L.
Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
title Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
title_full Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
title_fullStr Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
title_full_unstemmed Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
title_short Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
title_sort assessing and minimizing re-identification risk in research data derived from health care records
topic Model/Framework
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450246/
https://www.ncbi.nlm.nih.gov/pubmed/30972355
http://dx.doi.org/10.5334/egems.270
work_keys_str_mv AT simongregorye assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT shortreedsusanm assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT coleyryates assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT penfoldrobertb assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT rossomrebeccac assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT waitzfelderbethe assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT sanchezkatherine assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords
AT lynchfrancesl assessingandminimizingreidentificationriskinresearchdataderivedfromhealthcarerecords