Cargando…

Addressing data privacy in matched studies via virtual pooling

BACKGROUND: Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes cova...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saha-Chaudhuri, P., Weinberg, C.R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5590217/ https://www.ncbi.nlm.nih.gov/pubmed/28882105 http://dx.doi.org/10.1186/s12874-017-0419-0

_version_	1783262493507846144
author	Saha-Chaudhuri, P. Weinberg, C.R.
author_facet	Saha-Chaudhuri, P. Weinberg, C.R.
author_sort	Saha-Chaudhuri, P.
collection	PubMed
description	BACKGROUND: Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. METHODS: We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. RESULTS: The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. CONCLUSIONS: Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.
format	Online Article Text
id	pubmed-5590217
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-55902172017-09-13 Addressing data privacy in matched studies via virtual pooling Saha-Chaudhuri, P. Weinberg, C.R. BMC Med Res Methodol Research Article BACKGROUND: Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. METHODS: We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. RESULTS: The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. CONCLUSIONS: Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data. BioMed Central 2017-09-07 /pmc/articles/PMC5590217/ /pubmed/28882105 http://dx.doi.org/10.1186/s12874-017-0419-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Saha-Chaudhuri, P. Weinberg, C.R. Addressing data privacy in matched studies via virtual pooling
title	Addressing data privacy in matched studies via virtual pooling
title_full	Addressing data privacy in matched studies via virtual pooling
title_fullStr	Addressing data privacy in matched studies via virtual pooling
title_full_unstemmed	Addressing data privacy in matched studies via virtual pooling
title_short	Addressing data privacy in matched studies via virtual pooling
title_sort	addressing data privacy in matched studies via virtual pooling
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5590217/ https://www.ncbi.nlm.nih.gov/pubmed/28882105 http://dx.doi.org/10.1186/s12874-017-0419-0
work_keys_str_mv	AT sahachaudhurip addressingdataprivacyinmatchedstudiesviavirtualpooling AT weinbergcr addressingdataprivacyinmatchedstudiesviavirtualpooling

Addressing data privacy in matched studies via virtual pooling

Ejemplares similares