Cargando…

Privacy preserving data anonymization of spontaneous ADE reporting system dataset

BACKGROUND: To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identificatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Wen-Yang, Yang, Duen-Chuan, Wang, Jie-Teng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959360/ https://www.ncbi.nlm.nih.gov/pubmed/27454754 http://dx.doi.org/10.1186/s12911-016-0293-4

_version_	1782444390314147840
author	Lin, Wen-Yang Yang, Duen-Chuan Wang, Jie-Teng
author_facet	Lin, Wen-Yang Yang, Duen-Chuan Wang, Jie-Teng
author_sort	Lin, Wen-Yang
collection	PubMed
description	BACKGROUND: To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. METHODS: We propose a new privacy model called MS(k, θ())-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ(), for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ(), including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. RESULTS: With all three different threshold settings for sensitive value, our method can successively prevent the disclosure of sensitive values (nearly all observed DRs are zeros) without sacrificing too much of data utility. With non-uniform threshold setting, level-wise or frequency-based, our MS(k, θ())-bounding exhibits the best data utility and the least privacy risk among all the models. The experiments conducted on selected ADR signals from MedWatch show that only very small difference on signal strength (PRR or ROR) were observed. The results show that our method can effectively prevent the disclosure of patient sensitive information without sacrificing data utility for ADR signal detection. CONCLUSIONS: We propose a new privacy model for protecting SRS data that possess some characteristics overlooked by contemporary models and an anonymization algorithm to sanitize SRS data in accordance with the proposed model. Empirical evaluation on the real SRS dataset, i.e., FAERS, shows that our method can effectively solve the privacy problem in SRS data without influencing the ADR signal strength.
format	Online Article Text
id	pubmed-4959360
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-49593602016-08-01 Privacy preserving data anonymization of spontaneous ADE reporting system dataset Lin, Wen-Yang Yang, Duen-Chuan Wang, Jie-Teng BMC Med Inform Decis Mak Research BACKGROUND: To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. METHODS: We propose a new privacy model called MS(k, θ())-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ(), for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ(), including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. RESULTS: With all three different threshold settings for sensitive value, our method can successively prevent the disclosure of sensitive values (nearly all observed DRs are zeros) without sacrificing too much of data utility. With non-uniform threshold setting, level-wise or frequency-based, our MS(k, θ())-bounding exhibits the best data utility and the least privacy risk among all the models. The experiments conducted on selected ADR signals from MedWatch show that only very small difference on signal strength (PRR or ROR) were observed. The results show that our method can effectively prevent the disclosure of patient sensitive information without sacrificing data utility for ADR signal detection. CONCLUSIONS: We propose a new privacy model for protecting SRS data that possess some characteristics overlooked by contemporary models and an anonymization algorithm to sanitize SRS data in accordance with the proposed model. Empirical evaluation on the real SRS dataset, i.e., FAERS, shows that our method can effectively solve the privacy problem in SRS data without influencing the ADR signal strength. BioMed Central 2016-07-18 /pmc/articles/PMC4959360/ /pubmed/27454754 http://dx.doi.org/10.1186/s12911-016-0293-4 Text en © Lin et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Lin, Wen-Yang Yang, Duen-Chuan Wang, Jie-Teng Privacy preserving data anonymization of spontaneous ADE reporting system dataset
title	Privacy preserving data anonymization of spontaneous ADE reporting system dataset
title_full	Privacy preserving data anonymization of spontaneous ADE reporting system dataset
title_fullStr	Privacy preserving data anonymization of spontaneous ADE reporting system dataset
title_full_unstemmed	Privacy preserving data anonymization of spontaneous ADE reporting system dataset
title_short	Privacy preserving data anonymization of spontaneous ADE reporting system dataset
title_sort	privacy preserving data anonymization of spontaneous ade reporting system dataset
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959360/ https://www.ncbi.nlm.nih.gov/pubmed/27454754 http://dx.doi.org/10.1186/s12911-016-0293-4
work_keys_str_mv	AT linwenyang privacypreservingdataanonymizationofspontaneousadereportingsystemdataset AT yangduenchuan privacypreservingdataanonymizationofspontaneousadereportingsystemdataset AT wangjieteng privacypreservingdataanonymizationofspontaneousadereportingsystemdataset

Privacy preserving data anonymization of spontaneous ADE reporting system dataset

Ejemplares similares