Cargando…
Open tools for quantitative anonymization of tabular phenotype data: literature review
Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for re...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677485/ https://www.ncbi.nlm.nih.gov/pubmed/36215114 http://dx.doi.org/10.1093/bib/bbac440 |
_version_ | 1784833821637607424 |
---|---|
author | Haber, Anna C Sax, Ulrich Prasser, Fabian |
author_facet | Haber, Anna C Sax, Ulrich Prasser, Fabian |
author_sort | Haber, Anna C |
collection | PubMed |
description | Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties. |
format | Online Article Text |
id | pubmed-9677485 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-96774852022-11-21 Open tools for quantitative anonymization of tabular phenotype data: literature review Haber, Anna C Sax, Ulrich Prasser, Fabian Brief Bioinform Review Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties. Oxford University Press 2022-10-10 /pmc/articles/PMC9677485/ /pubmed/36215114 http://dx.doi.org/10.1093/bib/bbac440 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Haber, Anna C Sax, Ulrich Prasser, Fabian Open tools for quantitative anonymization of tabular phenotype data: literature review |
title | Open tools for quantitative anonymization of tabular phenotype data: literature review |
title_full | Open tools for quantitative anonymization of tabular phenotype data: literature review |
title_fullStr | Open tools for quantitative anonymization of tabular phenotype data: literature review |
title_full_unstemmed | Open tools for quantitative anonymization of tabular phenotype data: literature review |
title_short | Open tools for quantitative anonymization of tabular phenotype data: literature review |
title_sort | open tools for quantitative anonymization of tabular phenotype data: literature review |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677485/ https://www.ncbi.nlm.nih.gov/pubmed/36215114 http://dx.doi.org/10.1093/bib/bbac440 |
work_keys_str_mv | AT haberannac opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview AT saxulrich opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview AT prasserfabian opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview AT opentoolsforquantitativeanonymizationoftabularphenotypedataliteraturereview |