Cargando…

Evaluating Common De-Identification Heuristics for Personal Health Information

BACKGROUND: With the growing adoption of electronic medical records, there are increasing demands for the use of this electronic clinical data in observational research. A frequent ethics board requirement for such secondary use of personal health information in observational research is that the da...

Descripción completa

Detalles Bibliográficos
Autores principales:	El Emam, Khaled, Jabbouri, Sam, Sams, Scott, Drouet, Youenn, Power, Michael
Formato:	Texto
Lenguaje:	English
Publicado:	Gunther Eysenbach 2006
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1794009/ https://www.ncbi.nlm.nih.gov/pubmed/17213047 http://dx.doi.org/10.2196/jmir.8.4.e28

_version_	1782132147980599296
author	El Emam, Khaled Jabbouri, Sam Sams, Scott Drouet, Youenn Power, Michael
author_facet	El Emam, Khaled Jabbouri, Sam Sams, Scott Drouet, Youenn Power, Michael
author_sort	El Emam, Khaled
collection	PubMed
description	BACKGROUND: With the growing adoption of electronic medical records, there are increasing demands for the use of this electronic clinical data in observational research. A frequent ethics board requirement for such secondary use of personal health information in observational research is that the data be de-identified. De-identification heuristics are provided in the Health Insurance Portability and Accountability Act Privacy Rule, funding agency and professional association privacy guidelines, and common practice. OBJECTIVE: The aim of the study was to evaluate whether the re-identification risks due to record linkage are sufficiently low when following common de-identification heuristics and whether the risk is stable across sample sizes and data sets. METHODS: Two methods were followed to construct identification data sets. Re-identification attacks were simulated on these. For each data set we varied the sample size down to 30 individuals, and for each sample size evaluated the risk of re-identification for all combinations of quasi-identifiers. The combinations of quasi-identifiers that were low risk more than 50% of the time were considered stable. RESULTS: The identification data sets we were able to construct were the list of all physicians and the list of all lawyers registered in Ontario, using 1% sampling fractions. The quasi-identifiers of region, gender, and year of birth were found to be low risk more than 50% of the time across both data sets. The combination of gender and region was also found to be low risk more than 50% of the time. We were not able to create an identification data set for the whole population. CONCLUSIONS: Existing Canadian federal and provincial privacy laws help explain why it is difficult to create an identification data set for the whole population. That such examples of high re-identification risk exist for mainstream professions makes a strong case for not disclosing the high-risk variables and their combinations identified here. For professional subpopulations with published membership lists, many variables often needed by researchers would have to be excluded or generalized to ensure consistently low re-identification risk. Data custodians and researchers need to consider other statistical disclosure techniques for protecting privacy.
format	Text
id	pubmed-1794009
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	Gunther Eysenbach
record_format	MEDLINE/PubMed
spelling	pubmed-17940092007-02-06 Evaluating Common De-Identification Heuristics for Personal Health Information El Emam, Khaled Jabbouri, Sam Sams, Scott Drouet, Youenn Power, Michael J Med Internet Res Original Paper BACKGROUND: With the growing adoption of electronic medical records, there are increasing demands for the use of this electronic clinical data in observational research. A frequent ethics board requirement for such secondary use of personal health information in observational research is that the data be de-identified. De-identification heuristics are provided in the Health Insurance Portability and Accountability Act Privacy Rule, funding agency and professional association privacy guidelines, and common practice. OBJECTIVE: The aim of the study was to evaluate whether the re-identification risks due to record linkage are sufficiently low when following common de-identification heuristics and whether the risk is stable across sample sizes and data sets. METHODS: Two methods were followed to construct identification data sets. Re-identification attacks were simulated on these. For each data set we varied the sample size down to 30 individuals, and for each sample size evaluated the risk of re-identification for all combinations of quasi-identifiers. The combinations of quasi-identifiers that were low risk more than 50% of the time were considered stable. RESULTS: The identification data sets we were able to construct were the list of all physicians and the list of all lawyers registered in Ontario, using 1% sampling fractions. The quasi-identifiers of region, gender, and year of birth were found to be low risk more than 50% of the time across both data sets. The combination of gender and region was also found to be low risk more than 50% of the time. We were not able to create an identification data set for the whole population. CONCLUSIONS: Existing Canadian federal and provincial privacy laws help explain why it is difficult to create an identification data set for the whole population. That such examples of high re-identification risk exist for mainstream professions makes a strong case for not disclosing the high-risk variables and their combinations identified here. For professional subpopulations with published membership lists, many variables often needed by researchers would have to be excluded or generalized to ensure consistently low re-identification risk. Data custodians and researchers need to consider other statistical disclosure techniques for protecting privacy. Gunther Eysenbach 2006-11-21 /pmc/articles/PMC1794009/ /pubmed/17213047 http://dx.doi.org/10.2196/jmir.8.4.e28 Text en © Khaled El Emam, Sam Jabbouri, Scott Sams, Youenn Drouet, Michael Power. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 21.11.2006. Except where otherwise noted, articles published in the Journal of Medical Internet Research are distributed under the terms of the Creative Commons Attribution License (http://www.creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited, including full bibliographic details and the URL (see "please cite as" above), and this statement is included.
spellingShingle	Original Paper El Emam, Khaled Jabbouri, Sam Sams, Scott Drouet, Youenn Power, Michael Evaluating Common De-Identification Heuristics for Personal Health Information
title	Evaluating Common De-Identification Heuristics for Personal Health Information
title_full	Evaluating Common De-Identification Heuristics for Personal Health Information
title_fullStr	Evaluating Common De-Identification Heuristics for Personal Health Information
title_full_unstemmed	Evaluating Common De-Identification Heuristics for Personal Health Information
title_short	Evaluating Common De-Identification Heuristics for Personal Health Information
title_sort	evaluating common de-identification heuristics for personal health information
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1794009/ https://www.ncbi.nlm.nih.gov/pubmed/17213047 http://dx.doi.org/10.2196/jmir.8.4.e28
work_keys_str_mv	AT elemamkhaled evaluatingcommondeidentificationheuristicsforpersonalhealthinformation AT jabbourisam evaluatingcommondeidentificationheuristicsforpersonalhealthinformation AT samsscott evaluatingcommondeidentificationheuristicsforpersonalhealthinformation AT drouetyouenn evaluatingcommondeidentificationheuristicsforpersonalhealthinformation AT powermichael evaluatingcommondeidentificationheuristicsforpersonalhealthinformation

Evaluating Common De-Identification Heuristics for Personal Health Information

Ejemplares similares