Cargando…

A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks

INTRODUCTION: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these...

Descripción completa

Detalles Bibliográficos
Autores principales: Her, Qoua L., Malenfant, Jessica M., Malek, Sarah, Vilk, Yury, Young, Jessica, Li, Lingling, Brown, Jeffery, Toh, Sengwee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Ubiquity Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078121/
https://www.ncbi.nlm.nih.gov/pubmed/30094283
http://dx.doi.org/10.5334/egems.209
_version_ 1783345039235088384
author Her, Qoua L.
Malenfant, Jessica M.
Malek, Sarah
Vilk, Yury
Young, Jessica
Li, Lingling
Brown, Jeffery
Toh, Sengwee
author_facet Her, Qoua L.
Malenfant, Jessica M.
Malek, Sarah
Vilk, Yury
Young, Jessica
Li, Lingling
Brown, Jeffery
Toh, Sengwee
author_sort Her, Qoua L.
collection PubMed
description INTRODUCTION: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs. OBJECTIVE: We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use PopMedNet™, an open-source distributed networking software platform. METHODS: We surveyed and catalogued existing hardware and software configurations at all data partners in the Sentinel System, a PopMedNet-driven DDN. Key guiding principles for the design included minimal disruptions to the current PopMedNet query workflow and minimal modifications to data partners’ hardware configurations and software requirements. RESULTS: We developed and implemented a three-step process framework and PopMedNet query workflow that enables automatable DRA: 1) assembling a de-identified patient-level dataset at each data partner, 2) distributing a DRA package to data partners for local iterative analysis, and 3) iteratively transferring intermediate files between data partners and analysis center. The DRA query workflow is agnostic to statistical software, accommodates different regression models, and allows different levels of user-specified automation. DISCUSSION: The process framework can be generalized to and the query workflow can be adopted by other PopMedNet-based DDNs. CONCLUSION: DRA has great potential to change the paradigm of data analysis in DDNs. Successful implementation of DRA in Sentinel will facilitate adoption of the analytic approach in other DDNs.
format Online
Article
Text
id pubmed-6078121
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Ubiquity Press
record_format MEDLINE/PubMed
spelling pubmed-60781212018-08-09 A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks Her, Qoua L. Malenfant, Jessica M. Malek, Sarah Vilk, Yury Young, Jessica Li, Lingling Brown, Jeffery Toh, Sengwee EGEMS (Wash DC) Model/Framework INTRODUCTION: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs. OBJECTIVE: We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use PopMedNet™, an open-source distributed networking software platform. METHODS: We surveyed and catalogued existing hardware and software configurations at all data partners in the Sentinel System, a PopMedNet-driven DDN. Key guiding principles for the design included minimal disruptions to the current PopMedNet query workflow and minimal modifications to data partners’ hardware configurations and software requirements. RESULTS: We developed and implemented a three-step process framework and PopMedNet query workflow that enables automatable DRA: 1) assembling a de-identified patient-level dataset at each data partner, 2) distributing a DRA package to data partners for local iterative analysis, and 3) iteratively transferring intermediate files between data partners and analysis center. The DRA query workflow is agnostic to statistical software, accommodates different regression models, and allows different levels of user-specified automation. DISCUSSION: The process framework can be generalized to and the query workflow can be adopted by other PopMedNet-based DDNs. CONCLUSION: DRA has great potential to change the paradigm of data analysis in DDNs. Successful implementation of DRA in Sentinel will facilitate adoption of the analytic approach in other DDNs. Ubiquity Press 2018-05-25 /pmc/articles/PMC6078121/ /pubmed/30094283 http://dx.doi.org/10.5334/egems.209 Text en Copyright: © 2018 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
spellingShingle Model/Framework
Her, Qoua L.
Malenfant, Jessica M.
Malek, Sarah
Vilk, Yury
Young, Jessica
Li, Lingling
Brown, Jeffery
Toh, Sengwee
A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
title A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
title_full A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
title_fullStr A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
title_full_unstemmed A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
title_short A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
title_sort query workflow design to perform automatable distributed regression analysis in large distributed data networks
topic Model/Framework
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078121/
https://www.ncbi.nlm.nih.gov/pubmed/30094283
http://dx.doi.org/10.5334/egems.209
work_keys_str_mv AT herqoual aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT malenfantjessicam aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT maleksarah aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT vilkyury aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT youngjessica aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT lilingling aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT brownjeffery aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT tohsengwee aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT herqoual queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT malenfantjessicam queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT maleksarah queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT vilkyury queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT youngjessica queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT lilingling queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT brownjeffery queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks
AT tohsengwee queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks