Cargando…

Open tools for quantitative anonymization of tabular phenotype data: literature review

Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for re...

Descripción completa

Detalles Bibliográficos
Autores principales: Haber, Anna C, Sax, Ulrich, Prasser, Fabian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677485/
https://www.ncbi.nlm.nih.gov/pubmed/36215114
http://dx.doi.org/10.1093/bib/bbac440
_version_ 1784833821637607424
author Haber, Anna C
Sax, Ulrich
Prasser, Fabian
author_facet Haber, Anna C
Sax, Ulrich
Prasser, Fabian
author_sort Haber, Anna C
collection PubMed
description Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties.
format Online
Article
Text
id pubmed-9677485
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96774852022-11-21 Open tools for quantitative anonymization of tabular phenotype data: literature review Haber, Anna C Sax, Ulrich Prasser, Fabian Brief Bioinform Review Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties. Oxford University Press 2022-10-10 /pmc/articles/PMC9677485/ /pubmed/36215114 http://dx.doi.org/10.1093/bib/bbac440 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Haber, Anna C
Sax, Ulrich
Prasser, Fabian
Open tools for quantitative anonymization of tabular phenotype data: literature review
title Open tools for quantitative anonymization of tabular phenotype data: literature review
title_full Open tools for quantitative anonymization of tabular phenotype data: literature review
title_fullStr Open tools for quantitative anonymization of tabular phenotype data: literature review
title_full_unstemmed Open tools for quantitative anonymization of tabular phenotype data: literature review
title_short Open tools for quantitative anonymization of tabular phenotype data: literature review
title_sort open tools for quantitative anonymization of tabular phenotype data: literature review
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677485/
https://www.ncbi.nlm.nih.gov/pubmed/36215114
http://dx.doi.org/10.1093/bib/bbac440
work_keys_str_mv AT haberannac opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview
AT saxulrich opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview
AT prasserfabian opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview
AT opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview