Cargando…

Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients

Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-of...

Descripción completa

Detalles Bibliográficos
Autores principales: Koll, Carolin E. M., Hopff, Sina M., Meurers, Thierry, Lee, Chin Huang, Kohls, Mirjam, Stellbrink, Christoph, Thibeault, Charlotte, Reinke, Lennart, Steinbrecher, Sarah, Schreiber, Stefan, Mitrov, Lazar, Frank, Sandra, Miljukov, Olga, Erber, Johanna, Hellmuth, Johannes C., Reese, Jens-Peter, Steinbeis, Fridolin, Bahmer, Thomas, Hagen, Marina, Meybohm, Patrick, Hansch, Stefan, Vadász, István, Krist, Lilian, Jiru-Hillmann, Steffi, Prasser, Fabian, Vehreschild, Jörg Janne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9769467/
https://www.ncbi.nlm.nih.gov/pubmed/36543828
http://dx.doi.org/10.1038/s41597-022-01669-9
_version_ 1784854376877129728
author Koll, Carolin E. M.
Hopff, Sina M.
Meurers, Thierry
Lee, Chin Huang
Kohls, Mirjam
Stellbrink, Christoph
Thibeault, Charlotte
Reinke, Lennart
Steinbrecher, Sarah
Schreiber, Stefan
Mitrov, Lazar
Frank, Sandra
Miljukov, Olga
Erber, Johanna
Hellmuth, Johannes C.
Reese, Jens-Peter
Steinbeis, Fridolin
Bahmer, Thomas
Hagen, Marina
Meybohm, Patrick
Hansch, Stefan
Vadász, István
Krist, Lilian
Jiru-Hillmann, Steffi
Prasser, Fabian
Vehreschild, Jörg Janne
author_facet Koll, Carolin E. M.
Hopff, Sina M.
Meurers, Thierry
Lee, Chin Huang
Kohls, Mirjam
Stellbrink, Christoph
Thibeault, Charlotte
Reinke, Lennart
Steinbrecher, Sarah
Schreiber, Stefan
Mitrov, Lazar
Frank, Sandra
Miljukov, Olga
Erber, Johanna
Hellmuth, Johannes C.
Reese, Jens-Peter
Steinbeis, Fridolin
Bahmer, Thomas
Hagen, Marina
Meybohm, Patrick
Hansch, Stefan
Vadász, István
Krist, Lilian
Jiru-Hillmann, Steffi
Prasser, Fabian
Vehreschild, Jörg Janne
author_sort Koll, Carolin E. M.
collection PubMed
description Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data.
format Online
Article
Text
id pubmed-9769467
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-97694672022-12-22 Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients Koll, Carolin E. M. Hopff, Sina M. Meurers, Thierry Lee, Chin Huang Kohls, Mirjam Stellbrink, Christoph Thibeault, Charlotte Reinke, Lennart Steinbrecher, Sarah Schreiber, Stefan Mitrov, Lazar Frank, Sandra Miljukov, Olga Erber, Johanna Hellmuth, Johannes C. Reese, Jens-Peter Steinbeis, Fridolin Bahmer, Thomas Hagen, Marina Meybohm, Patrick Hansch, Stefan Vadász, István Krist, Lilian Jiru-Hillmann, Steffi Prasser, Fabian Vehreschild, Jörg Janne Sci Data Analysis Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data. Nature Publishing Group UK 2022-12-21 /pmc/articles/PMC9769467/ /pubmed/36543828 http://dx.doi.org/10.1038/s41597-022-01669-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Analysis
Koll, Carolin E. M.
Hopff, Sina M.
Meurers, Thierry
Lee, Chin Huang
Kohls, Mirjam
Stellbrink, Christoph
Thibeault, Charlotte
Reinke, Lennart
Steinbrecher, Sarah
Schreiber, Stefan
Mitrov, Lazar
Frank, Sandra
Miljukov, Olga
Erber, Johanna
Hellmuth, Johannes C.
Reese, Jens-Peter
Steinbeis, Fridolin
Bahmer, Thomas
Hagen, Marina
Meybohm, Patrick
Hansch, Stefan
Vadász, István
Krist, Lilian
Jiru-Hillmann, Steffi
Prasser, Fabian
Vehreschild, Jörg Janne
Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
title Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
title_full Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
title_fullStr Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
title_full_unstemmed Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
title_short Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients
title_sort statistical biases due to anonymization evaluated in an open clinical dataset from covid-19 patients
topic Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9769467/
https://www.ncbi.nlm.nih.gov/pubmed/36543828
http://dx.doi.org/10.1038/s41597-022-01669-9
work_keys_str_mv AT kollcarolinem statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT hopffsinam statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT meurersthierry statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT leechinhuang statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT kohlsmirjam statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT stellbrinkchristoph statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT thibeaultcharlotte statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT reinkelennart statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT steinbrechersarah statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT schreiberstefan statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT mitrovlazar statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT franksandra statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT miljukovolga statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT erberjohanna statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT hellmuthjohannesc statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT reesejenspeter statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT steinbeisfridolin statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT bahmerthomas statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT hagenmarina statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT meybohmpatrick statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT hanschstefan statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT vadaszistvan statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT kristlilian statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT jiruhillmannsteffi statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT prasserfabian statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT vehreschildjorgjanne statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients
AT statisticalbiasesduetoanonymizationevaluatedinanopenclinicaldatasetfromcovid19patients