Cargando…

The re-identification risk of Canadians from longitudinal demographics

BACKGROUND: The public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identifica...

Descripción completa

Detalles Bibliográficos
Autores principales: El Emam, Khaled, Buckeridge, David, Tamblyn, Robyn, Neisa, Angelica, Jonker, Elizabeth, Verma, Aman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3151203/
https://www.ncbi.nlm.nih.gov/pubmed/21696636
http://dx.doi.org/10.1186/1472-6947-11-46
_version_ 1782209585195515904
author El Emam, Khaled
Buckeridge, David
Tamblyn, Robyn
Neisa, Angelica
Jonker, Elizabeth
Verma, Aman
author_facet El Emam, Khaled
Buckeridge, David
Tamblyn, Robyn
Neisa, Angelica
Jonker, Elizabeth
Verma, Aman
author_sort El Emam, Khaled
collection PubMed
description BACKGROUND: The public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians. METHODS: Uniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years. RESULTS: Almost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided. CONCLUSIONS: A large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets.
format Online
Article
Text
id pubmed-3151203
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31512032011-08-06 The re-identification risk of Canadians from longitudinal demographics El Emam, Khaled Buckeridge, David Tamblyn, Robyn Neisa, Angelica Jonker, Elizabeth Verma, Aman BMC Med Inform Decis Mak Research Article BACKGROUND: The public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians. METHODS: Uniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years. RESULTS: Almost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided. CONCLUSIONS: A large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets. BioMed Central 2011-06-22 /pmc/articles/PMC3151203/ /pubmed/21696636 http://dx.doi.org/10.1186/1472-6947-11-46 Text en Copyright ©2011 El Emam et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
El Emam, Khaled
Buckeridge, David
Tamblyn, Robyn
Neisa, Angelica
Jonker, Elizabeth
Verma, Aman
The re-identification risk of Canadians from longitudinal demographics
title The re-identification risk of Canadians from longitudinal demographics
title_full The re-identification risk of Canadians from longitudinal demographics
title_fullStr The re-identification risk of Canadians from longitudinal demographics
title_full_unstemmed The re-identification risk of Canadians from longitudinal demographics
title_short The re-identification risk of Canadians from longitudinal demographics
title_sort re-identification risk of canadians from longitudinal demographics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3151203/
https://www.ncbi.nlm.nih.gov/pubmed/21696636
http://dx.doi.org/10.1186/1472-6947-11-46
work_keys_str_mv AT elemamkhaled thereidentificationriskofcanadiansfromlongitudinaldemographics
AT buckeridgedavid thereidentificationriskofcanadiansfromlongitudinaldemographics
AT tamblynrobyn thereidentificationriskofcanadiansfromlongitudinaldemographics
AT neisaangelica thereidentificationriskofcanadiansfromlongitudinaldemographics
AT jonkerelizabeth thereidentificationriskofcanadiansfromlongitudinaldemographics
AT vermaaman thereidentificationriskofcanadiansfromlongitudinaldemographics
AT elemamkhaled reidentificationriskofcanadiansfromlongitudinaldemographics
AT buckeridgedavid reidentificationriskofcanadiansfromlongitudinaldemographics
AT tamblynrobyn reidentificationriskofcanadiansfromlongitudinaldemographics
AT neisaangelica reidentificationriskofcanadiansfromlongitudinaldemographics
AT jonkerelizabeth reidentificationriskofcanadiansfromlongitudinaldemographics
AT vermaaman reidentificationriskofcanadiansfromlongitudinaldemographics