Cargando…

Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review

BACKGROUND: The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in di...

Descripción completa

Detalles Bibliográficos
Autores principales: Chevrier, Raphaël, Foufi, Vasiliki, Gaudet-Blavignac, Christophe, Robert, Arnaud, Lovis, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6658290/
https://www.ncbi.nlm.nih.gov/pubmed/31152528
http://dx.doi.org/10.2196/13484
_version_ 1783438940116615168
author Chevrier, Raphaël
Foufi, Vasiliki
Gaudet-Blavignac, Christophe
Robert, Arnaud
Lovis, Christian
author_facet Chevrier, Raphaël
Foufi, Vasiliki
Gaudet-Blavignac, Christophe
Robert, Arnaud
Lovis, Christian
author_sort Chevrier, Raphaël
collection PubMed
description BACKGROUND: The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other. OBJECTIVE: This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers. METHODS: Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently. RESULTS: After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data. CONCLUSIONS: Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.
format Online
Article
Text
id pubmed-6658290
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-66582902019-07-31 Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review Chevrier, Raphaël Foufi, Vasiliki Gaudet-Blavignac, Christophe Robert, Arnaud Lovis, Christian J Med Internet Res Review BACKGROUND: The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other. OBJECTIVE: This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers. METHODS: Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently. RESULTS: After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data. CONCLUSIONS: Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community. JMIR Publications 2019-05-31 /pmc/articles/PMC6658290/ /pubmed/31152528 http://dx.doi.org/10.2196/13484 Text en ©Raphaël Chevrier, Vasiliki Foufi, Christophe Gaudet-Blavignac, Arnaud Robert, Christian Lovis. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 31.05.2019. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Review
Chevrier, Raphaël
Foufi, Vasiliki
Gaudet-Blavignac, Christophe
Robert, Arnaud
Lovis, Christian
Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
title Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
title_full Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
title_fullStr Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
title_full_unstemmed Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
title_short Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review
title_sort use and understanding of anonymization and de-identification in the biomedical literature: scoping review
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6658290/
https://www.ncbi.nlm.nih.gov/pubmed/31152528
http://dx.doi.org/10.2196/13484
work_keys_str_mv AT chevrierraphael useandunderstandingofanonymizationanddeidentificationinthebiomedicalliteraturescopingreview
AT foufivasiliki useandunderstandingofanonymizationanddeidentificationinthebiomedicalliteraturescopingreview
AT gaudetblavignacchristophe useandunderstandingofanonymizationanddeidentificationinthebiomedicalliteraturescopingreview
AT robertarnaud useandunderstandingofanonymizationanddeidentificationinthebiomedicalliteraturescopingreview
AT lovischristian useandunderstandingofanonymizationanddeidentificationinthebiomedicalliteraturescopingreview