Cargando…
Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-of...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9769467/ https://www.ncbi.nlm.nih.gov/pubmed/36543828 http://dx.doi.org/10.1038/s41597-022-01669-9 |
_version_ | 1784854376877129728 |
---|---|
author | Koll, Carolin E. M. Hopff, Sina M. Meurers, Thierry Lee, Chin Huang Kohls, Mirjam Stellbrink, Christoph Thibeault, Charlotte Reinke, Lennart Steinbrecher, Sarah Schreiber, Stefan Mitrov, Lazar Frank, Sandra Miljukov, Olga Erber, Johanna Hellmuth, Johannes C. Reese, Jens-Peter Steinbeis, Fridolin Bahmer, Thomas Hagen, Marina Meybohm, Patrick Hansch, Stefan Vadász, István Krist, Lilian Jiru-Hillmann, Steffi Prasser, Fabian Vehreschild, Jörg Janne |
author_facet | Koll, Carolin E. M. Hopff, Sina M. Meurers, Thierry Lee, Chin Huang Kohls, Mirjam Stellbrink, Christoph Thibeault, Charlotte Reinke, Lennart Steinbrecher, Sarah Schreiber, Stefan Mitrov, Lazar Frank, Sandra Miljukov, Olga Erber, Johanna Hellmuth, Johannes C. Reese, Jens-Peter Steinbeis, Fridolin Bahmer, Thomas Hagen, Marina Meybohm, Patrick Hansch, Stefan Vadász, István Krist, Lilian Jiru-Hillmann, Steffi Prasser, Fabian Vehreschild, Jörg Janne |
author_sort | Koll, Carolin E. M. |
collection | PubMed |
description | Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data. |
format | Online Article Text |
id | pubmed-9769467 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-97694672022-12-22 Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients Koll, Carolin E. M. Hopff, Sina M. Meurers, Thierry Lee, Chin Huang Kohls, Mirjam Stellbrink, Christoph Thibeault, Charlotte Reinke, Lennart Steinbrecher, Sarah Schreiber, Stefan Mitrov, Lazar Frank, Sandra Miljukov, Olga Erber, Johanna Hellmuth, Johannes C. Reese, Jens-Peter Steinbeis, Fridolin Bahmer, Thomas Hagen, Marina Meybohm, Patrick Hansch, Stefan Vadász, István Krist, Lilian Jiru-Hillmann, Steffi Prasser, Fabian Vehreschild, Jörg Janne Sci Data Analysis Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data. Nature Publishing Group UK 2022-12-21 /pmc/articles/PMC9769467/ /pubmed/36543828 http://dx.doi.org/10.1038/s41597-022-01669-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Analysis Koll, Carolin E. M. Hopff, Sina M. Meurers, Thierry Lee, Chin Huang Kohls, Mirjam Stellbrink, Christoph Thibeault, Charlotte Reinke, Lennart Steinbrecher, Sarah Schreiber, Stefan Mitrov, Lazar Frank, Sandra Miljukov, Olga Erber, Johanna Hellmuth, Johannes C. Reese, Jens-Peter Steinbeis, Fridolin Bahmer, Thomas Hagen, Marina Meybohm, Patrick Hansch, Stefan Vadász, István Krist, Lilian Jiru-Hillmann, Steffi Prasser, Fabian Vehreschild, Jörg Janne Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients |
title | Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients |
title_full | Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients |
title_fullStr | Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients |
title_full_unstemmed | Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients |
title_short | Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients |
title_sort | statistical biases due to anonymization evaluated in an open clinical dataset from covid-19 patients |
topic | Analysis |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9769467/ https://www.ncbi.nlm.nih.gov/pubmed/36543828 http://dx.doi.org/10.1038/s41597-022-01669-9 |
work_keys_str_mv | AT kollcarolinem statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT hopffsinam statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT meurersthierry statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT leechinhuang statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT kohlsmirjam statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT stellbrinkchristoph statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT thibeaultcharlotte statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT reinkelennart statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT steinbrechersarah statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT schreiberstefan statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT mitrovlazar statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT franksandra statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT miljukovolga statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT erberjohanna statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT hellmuthjohannesc statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT reesejenspeter statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT steinbeisfridolin statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT bahmerthomas statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT hagenmarina statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT meybohmpatrick statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT hanschstefan statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT vadaszistvan statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT kristlilian statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT jiruhillmannsteffi statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT prasserfabian statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT vehreschildjorgjanne statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients AT statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients |