Cargando…
Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study
BACKGROUND: Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations e...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556642/ https://www.ncbi.nlm.nih.gov/pubmed/34652278 http://dx.doi.org/10.2196/29871 |
_version_ | 1784592211481985024 |
---|---|
author | Zuo, Zheming Watson, Matthew Budgen, David Hall, Robert Kennelly, Chris Al Moubayed, Noura |
author_facet | Zuo, Zheming Watson, Matthew Budgen, David Hall, Robert Kennelly, Chris Al Moubayed, Noura |
author_sort | Zuo, Zheming |
collection | PubMed |
description | BACKGROUND: Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union’s General Data Protection Regulation (2016) and the United Kingdom’s Data Protection Act (2018). For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy. OBJECTIVE: In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care. METHODS: We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Noteworthy gray literature was also used to initialize the search. We focused on review questions covering 5 bottom-up aspects: basic anonymization operations, privacy models, reidentification risk and usability metrics, off-the-shelf anonymization tools, and the lawful basis for EHR data anonymization. RESULTS: We identified 239 eligible studies, of which 60 were chosen for general background information; 16 were selected for 7 basic anonymization operations; 104 covered 72 conventional and machine learning–based privacy models; four and 19 papers included seven and 15 metrics, respectively, for measuring the reidentification risk and degree of usability; and 36 explored 20 data anonymization software tools. In addition, we also evaluated the practical feasibility of performing anonymization on EHR data with reference to their usability in medical decision-making. Furthermore, we summarized the lawful basis for delivering guidance on practical EHR data anonymization. CONCLUSIONS: This systematic literature mapping study indicates that anonymization of EHR data is theoretically achievable; yet, it requires more research efforts in practical implementations to balance privacy preservation and usability to ensure more reliable health care applications. |
format | Online Article Text |
id | pubmed-8556642 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-85566422021-11-10 Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study Zuo, Zheming Watson, Matthew Budgen, David Hall, Robert Kennelly, Chris Al Moubayed, Noura JMIR Med Inform Review BACKGROUND: Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union’s General Data Protection Regulation (2016) and the United Kingdom’s Data Protection Act (2018). For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy. OBJECTIVE: In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care. METHODS: We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Noteworthy gray literature was also used to initialize the search. We focused on review questions covering 5 bottom-up aspects: basic anonymization operations, privacy models, reidentification risk and usability metrics, off-the-shelf anonymization tools, and the lawful basis for EHR data anonymization. RESULTS: We identified 239 eligible studies, of which 60 were chosen for general background information; 16 were selected for 7 basic anonymization operations; 104 covered 72 conventional and machine learning–based privacy models; four and 19 papers included seven and 15 metrics, respectively, for measuring the reidentification risk and degree of usability; and 36 explored 20 data anonymization software tools. In addition, we also evaluated the practical feasibility of performing anonymization on EHR data with reference to their usability in medical decision-making. Furthermore, we summarized the lawful basis for delivering guidance on practical EHR data anonymization. CONCLUSIONS: This systematic literature mapping study indicates that anonymization of EHR data is theoretically achievable; yet, it requires more research efforts in practical implementations to balance privacy preservation and usability to ensure more reliable health care applications. JMIR Publications 2021-10-15 /pmc/articles/PMC8556642/ /pubmed/34652278 http://dx.doi.org/10.2196/29871 Text en ©Zheming Zuo, Matthew Watson, David Budgen, Robert Hall, Chris Kennelly, Noura Al Moubayed. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 15.10.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Review Zuo, Zheming Watson, Matthew Budgen, David Hall, Robert Kennelly, Chris Al Moubayed, Noura Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study |
title | Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study |
title_full | Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study |
title_fullStr | Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study |
title_full_unstemmed | Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study |
title_short | Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study |
title_sort | data anonymization for pervasive health care: systematic literature mapping study |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556642/ https://www.ncbi.nlm.nih.gov/pubmed/34652278 http://dx.doi.org/10.2196/29871 |
work_keys_str_mv | AT zuozheming dataanonymizationforpervasivehealthcaresystematicliteraturemappingstudy AT watsonmatthew dataanonymizationforpervasivehealthcaresystematicliteraturemappingstudy AT budgendavid dataanonymizationforpervasivehealthcaresystematicliteraturemappingstudy AT hallrobert dataanonymizationforpervasivehealthcaresystematicliteraturemappingstudy AT kennellychris dataanonymizationforpervasivehealthcaresystematicliteraturemappingstudy AT almoubayednoura dataanonymizationforpervasivehealthcaresystematicliteraturemappingstudy |