Cargando…
A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks
INTRODUCTION: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Ubiquity Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078121/ https://www.ncbi.nlm.nih.gov/pubmed/30094283 http://dx.doi.org/10.5334/egems.209 |
_version_ | 1783345039235088384 |
---|---|
author | Her, Qoua L. Malenfant, Jessica M. Malek, Sarah Vilk, Yury Young, Jessica Li, Lingling Brown, Jeffery Toh, Sengwee |
author_facet | Her, Qoua L. Malenfant, Jessica M. Malek, Sarah Vilk, Yury Young, Jessica Li, Lingling Brown, Jeffery Toh, Sengwee |
author_sort | Her, Qoua L. |
collection | PubMed |
description | INTRODUCTION: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs. OBJECTIVE: We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use PopMedNet™, an open-source distributed networking software platform. METHODS: We surveyed and catalogued existing hardware and software configurations at all data partners in the Sentinel System, a PopMedNet-driven DDN. Key guiding principles for the design included minimal disruptions to the current PopMedNet query workflow and minimal modifications to data partners’ hardware configurations and software requirements. RESULTS: We developed and implemented a three-step process framework and PopMedNet query workflow that enables automatable DRA: 1) assembling a de-identified patient-level dataset at each data partner, 2) distributing a DRA package to data partners for local iterative analysis, and 3) iteratively transferring intermediate files between data partners and analysis center. The DRA query workflow is agnostic to statistical software, accommodates different regression models, and allows different levels of user-specified automation. DISCUSSION: The process framework can be generalized to and the query workflow can be adopted by other PopMedNet-based DDNs. CONCLUSION: DRA has great potential to change the paradigm of data analysis in DDNs. Successful implementation of DRA in Sentinel will facilitate adoption of the analytic approach in other DDNs. |
format | Online Article Text |
id | pubmed-6078121 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Ubiquity Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60781212018-08-09 A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks Her, Qoua L. Malenfant, Jessica M. Malek, Sarah Vilk, Yury Young, Jessica Li, Lingling Brown, Jeffery Toh, Sengwee EGEMS (Wash DC) Model/Framework INTRODUCTION: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs. OBJECTIVE: We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use PopMedNet™, an open-source distributed networking software platform. METHODS: We surveyed and catalogued existing hardware and software configurations at all data partners in the Sentinel System, a PopMedNet-driven DDN. Key guiding principles for the design included minimal disruptions to the current PopMedNet query workflow and minimal modifications to data partners’ hardware configurations and software requirements. RESULTS: We developed and implemented a three-step process framework and PopMedNet query workflow that enables automatable DRA: 1) assembling a de-identified patient-level dataset at each data partner, 2) distributing a DRA package to data partners for local iterative analysis, and 3) iteratively transferring intermediate files between data partners and analysis center. The DRA query workflow is agnostic to statistical software, accommodates different regression models, and allows different levels of user-specified automation. DISCUSSION: The process framework can be generalized to and the query workflow can be adopted by other PopMedNet-based DDNs. CONCLUSION: DRA has great potential to change the paradigm of data analysis in DDNs. Successful implementation of DRA in Sentinel will facilitate adoption of the analytic approach in other DDNs. Ubiquity Press 2018-05-25 /pmc/articles/PMC6078121/ /pubmed/30094283 http://dx.doi.org/10.5334/egems.209 Text en Copyright: © 2018 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Model/Framework Her, Qoua L. Malenfant, Jessica M. Malek, Sarah Vilk, Yury Young, Jessica Li, Lingling Brown, Jeffery Toh, Sengwee A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks |
title | A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks |
title_full | A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks |
title_fullStr | A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks |
title_full_unstemmed | A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks |
title_short | A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks |
title_sort | query workflow design to perform automatable distributed regression analysis in large distributed data networks |
topic | Model/Framework |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6078121/ https://www.ncbi.nlm.nih.gov/pubmed/30094283 http://dx.doi.org/10.5334/egems.209 |
work_keys_str_mv | AT herqoual aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT malenfantjessicam aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT maleksarah aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT vilkyury aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT youngjessica aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT lilingling aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT brownjeffery aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT tohsengwee aqueryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT herqoual queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT malenfantjessicam queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT maleksarah queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT vilkyury queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT youngjessica queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT lilingling queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT brownjeffery queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks AT tohsengwee queryworkflowdesigntoperformautomatabledistributedregressionanalysisinlargedistributeddatanetworks |