Cargando…
An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods
Electronic Health Records (EHRs) enable the sharing of patients’ medical data. Since EHRs include patients’ private data, access by researchers is restricted. Therefore k-anonymity is necessary to keep patients’ private data safe without damaging useful medical information. However, k-anonymity cann...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications Inc.
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626125/ https://www.ncbi.nlm.nih.gov/pubmed/23612074 http://dx.doi.org/10.2196/ijmr.2140 |
_version_ | 1782266148206673920 |
---|---|
author | Yoo, Sunyong Shin, Moonshik Lee, Doheon |
author_facet | Yoo, Sunyong Shin, Moonshik Lee, Doheon |
author_sort | Yoo, Sunyong |
collection | PubMed |
description | Electronic Health Records (EHRs) enable the sharing of patients’ medical data. Since EHRs include patients’ private data, access by researchers is restricted. Therefore k-anonymity is necessary to keep patients’ private data safe without damaging useful medical information. However, k-anonymity cannot prevent sensitive attribute disclosure. An alternative, l-diversity, has been proposed as a solution to this problem and is defined as: each Q-block (ie, each set of rows corresponding to the same value for identifiers) contains at least l well-represented values for each sensitive attribute. While l-diversity protects against sensitive attribute disclosure, it is limited in that it focuses only on diversifying sensitive attributes. The aim of the study is to develop a k-anonymity method that not only minimizes information loss but also achieves diversity of the sensitive attribute. This paper proposes a new privacy protection method that uses conditional entropy and mutual information. This method considers both information loss as well as diversity of sensitive attributes. Conditional entropy can measure the information loss by generalization, and mutual information is used to achieve the diversity of sensitive attributes. This method can offer appropriate Q-blocks for generalization. We used the adult database from the UCI Machine Learning Repository and found that the proposed method can greatly reduce information loss compared with a recent l-diversity study. It can also achieve the diversity of sensitive attributes by counting the number of Q-blocks that have leaks of diversity. This study provides a privacy protection method that can improve data utility and protect against sensitive attribute disclosure. The method is viable and should be of interest for further privacy protection in EHR applications. |
format | Online Article Text |
id | pubmed-3626125 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | JMIR Publications Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-36261252013-04-22 An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods Yoo, Sunyong Shin, Moonshik Lee, Doheon Interact J Med Res Original Paper Electronic Health Records (EHRs) enable the sharing of patients’ medical data. Since EHRs include patients’ private data, access by researchers is restricted. Therefore k-anonymity is necessary to keep patients’ private data safe without damaging useful medical information. However, k-anonymity cannot prevent sensitive attribute disclosure. An alternative, l-diversity, has been proposed as a solution to this problem and is defined as: each Q-block (ie, each set of rows corresponding to the same value for identifiers) contains at least l well-represented values for each sensitive attribute. While l-diversity protects against sensitive attribute disclosure, it is limited in that it focuses only on diversifying sensitive attributes. The aim of the study is to develop a k-anonymity method that not only minimizes information loss but also achieves diversity of the sensitive attribute. This paper proposes a new privacy protection method that uses conditional entropy and mutual information. This method considers both information loss as well as diversity of sensitive attributes. Conditional entropy can measure the information loss by generalization, and mutual information is used to achieve the diversity of sensitive attributes. This method can offer appropriate Q-blocks for generalization. We used the adult database from the UCI Machine Learning Repository and found that the proposed method can greatly reduce information loss compared with a recent l-diversity study. It can also achieve the diversity of sensitive attributes by counting the number of Q-blocks that have leaks of diversity. This study provides a privacy protection method that can improve data utility and protect against sensitive attribute disclosure. The method is viable and should be of interest for further privacy protection in EHR applications. JMIR Publications Inc. 2012-11-13 /pmc/articles/PMC3626125/ /pubmed/23612074 http://dx.doi.org/10.2196/ijmr.2140 Text en ©Sunyong Yoo, Moonshik Shin, Doheon Lee. Originally published in the Interactive Journal of Medical Research (http://www.i-jmr.org/), 13.11.2012. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.i-jmr.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Yoo, Sunyong Shin, Moonshik Lee, Doheon An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods |
title | An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods |
title_full | An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods |
title_fullStr | An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods |
title_full_unstemmed | An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods |
title_short | An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods |
title_sort | approach to reducing information loss and achieving diversity of sensitive attributes in k-anonymity methods |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626125/ https://www.ncbi.nlm.nih.gov/pubmed/23612074 http://dx.doi.org/10.2196/ijmr.2140 |
work_keys_str_mv | AT yoosunyong anapproachtoreducinginformationlossandachievingdiversityofsensitiveattributesinkanonymitymethods AT shinmoonshik anapproachtoreducinginformationlossandachievingdiversityofsensitiveattributesinkanonymitymethods AT leedoheon anapproachtoreducinginformationlossandachievingdiversityofsensitiveattributesinkanonymitymethods AT yoosunyong approachtoreducinginformationlossandachievingdiversityofsensitiveattributesinkanonymitymethods AT shinmoonshik approachtoreducinginformationlossandachievingdiversityofsensitiveattributesinkanonymitymethods AT leedoheon approachtoreducinginformationlossandachievingdiversityofsensitiveattributesinkanonymitymethods |