Cargando…

Privacy Risks of Sharing Data from Environmental Health Studies

BACKGROUND: Sharing research data uses resources effectively; enables large, diverse data sets; and supports rigor and reproducibility. However, sharing such data increases privacy risks for participants who may be re-identified by linking study data to outside data sets. These risks have been inves...

Descripción completa

Detalles Bibliográficos
Autores principales: Boronow, Katherine E., Perovich, Laura J., Sweeney, Latanya, Yoo, Ji Su, Rudel, Ruthann A., Brown, Phil, Brody, Julia Green
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Environmental Health Perspectives 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015543/
https://www.ncbi.nlm.nih.gov/pubmed/31922426
http://dx.doi.org/10.1289/EHP4817
_version_ 1783496816109551616
author Boronow, Katherine E.
Perovich, Laura J.
Sweeney, Latanya
Yoo, Ji Su
Rudel, Ruthann A.
Brown, Phil
Brody, Julia Green
author_facet Boronow, Katherine E.
Perovich, Laura J.
Sweeney, Latanya
Yoo, Ji Su
Rudel, Ruthann A.
Brown, Phil
Brody, Julia Green
author_sort Boronow, Katherine E.
collection PubMed
description BACKGROUND: Sharing research data uses resources effectively; enables large, diverse data sets; and supports rigor and reproducibility. However, sharing such data increases privacy risks for participants who may be re-identified by linking study data to outside data sets. These risks have been investigated for genetic and medical records but rarely for environmental data. OBJECTIVES: We evaluated how data in environmental health (EH) studies may be vulnerable to linkage and we investigated, in a case study, whether environmental measurements could contribute to inferring latent categories (e.g., geographic location), which increases privacy risks. METHODS: We identified 12 prominent EH studies, reviewed the data types collected, and evaluated the availability of outside data sets that overlap with study data. With data from the Household Exposure Study in California and Massachusetts and the Green Housing Study in Boston, Massachusetts, and Cincinnati, Ohio, we used k-means clustering and principal component analysis to investigate whether participants’ region of residence could be inferred from measurements of chemicals in household air and dust. RESULTS: All 12 studies included at least two of five data types that overlap with outside data sets: geographic location (9 studies), medical data (9 studies), occupation (10 studies), housing characteristics (10 studies), and genetic data (7 studies). In our cluster analysis, participants’ region of residence could be inferred with 80%–98% accuracy using environmental measurements with original laboratory reporting limits. DISCUSSION: EH studies frequently include data that are vulnerable to linkage with voter lists, tax and real estate data, professional licensing lists, and ancestry websites, and exposure measurements may be used to identify subgroup membership, increasing likelihood of linkage. Thus, unsupervised sharing of EH research data potentially raises substantial privacy risks. Empirical research can help characterize risks and evaluate technical solutions. Our findings reinforce the need for legal and policy protections to shield participants from potential harms of re-identification from data sharing. https://doi.org/10.1289/EHP4817
format Online
Article
Text
id pubmed-7015543
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Environmental Health Perspectives
record_format MEDLINE/PubMed
spelling pubmed-70155432020-02-14 Privacy Risks of Sharing Data from Environmental Health Studies Boronow, Katherine E. Perovich, Laura J. Sweeney, Latanya Yoo, Ji Su Rudel, Ruthann A. Brown, Phil Brody, Julia Green Environ Health Perspect Research BACKGROUND: Sharing research data uses resources effectively; enables large, diverse data sets; and supports rigor and reproducibility. However, sharing such data increases privacy risks for participants who may be re-identified by linking study data to outside data sets. These risks have been investigated for genetic and medical records but rarely for environmental data. OBJECTIVES: We evaluated how data in environmental health (EH) studies may be vulnerable to linkage and we investigated, in a case study, whether environmental measurements could contribute to inferring latent categories (e.g., geographic location), which increases privacy risks. METHODS: We identified 12 prominent EH studies, reviewed the data types collected, and evaluated the availability of outside data sets that overlap with study data. With data from the Household Exposure Study in California and Massachusetts and the Green Housing Study in Boston, Massachusetts, and Cincinnati, Ohio, we used k-means clustering and principal component analysis to investigate whether participants’ region of residence could be inferred from measurements of chemicals in household air and dust. RESULTS: All 12 studies included at least two of five data types that overlap with outside data sets: geographic location (9 studies), medical data (9 studies), occupation (10 studies), housing characteristics (10 studies), and genetic data (7 studies). In our cluster analysis, participants’ region of residence could be inferred with 80%–98% accuracy using environmental measurements with original laboratory reporting limits. DISCUSSION: EH studies frequently include data that are vulnerable to linkage with voter lists, tax and real estate data, professional licensing lists, and ancestry websites, and exposure measurements may be used to identify subgroup membership, increasing likelihood of linkage. Thus, unsupervised sharing of EH research data potentially raises substantial privacy risks. Empirical research can help characterize risks and evaluate technical solutions. Our findings reinforce the need for legal and policy protections to shield participants from potential harms of re-identification from data sharing. https://doi.org/10.1289/EHP4817 Environmental Health Perspectives 2020-01-10 /pmc/articles/PMC7015543/ /pubmed/31922426 http://dx.doi.org/10.1289/EHP4817 Text en EHP is an open-access journal published with support from the National Institute of Environmental Health Sciences, National Institutes of Health. All content is public domain unless otherwise noted.
spellingShingle Research
Boronow, Katherine E.
Perovich, Laura J.
Sweeney, Latanya
Yoo, Ji Su
Rudel, Ruthann A.
Brown, Phil
Brody, Julia Green
Privacy Risks of Sharing Data from Environmental Health Studies
title Privacy Risks of Sharing Data from Environmental Health Studies
title_full Privacy Risks of Sharing Data from Environmental Health Studies
title_fullStr Privacy Risks of Sharing Data from Environmental Health Studies
title_full_unstemmed Privacy Risks of Sharing Data from Environmental Health Studies
title_short Privacy Risks of Sharing Data from Environmental Health Studies
title_sort privacy risks of sharing data from environmental health studies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015543/
https://www.ncbi.nlm.nih.gov/pubmed/31922426
http://dx.doi.org/10.1289/EHP4817
work_keys_str_mv AT boronowkatherinee privacyrisksofsharingdatafromenvironmentalhealthstudies
AT perovichlauraj privacyrisksofsharingdatafromenvironmentalhealthstudies
AT sweeneylatanya privacyrisksofsharingdatafromenvironmentalhealthstudies
AT yoojisu privacyrisksofsharingdatafromenvironmentalhealthstudies
AT rudelruthanna privacyrisksofsharingdatafromenvironmentalhealthstudies
AT brownphil privacyrisksofsharingdatafromenvironmentalhealthstudies
AT brodyjuliagreen privacyrisksofsharingdatafromenvironmentalhealthstudies