Cargando…

Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance

BACKGROUND: A distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limit...

Descripción completa

Detalles Bibliográficos
Autores principales: Her, Qoua, Malenfant, Jessica, Zhang, Zilu, Vilk, Yury, Young, Jessica, Tabano, David, Hamilton, Jack, Johnson, Ron, Raebel, Marsha, Boudreau, Denise, Toh, Sengwee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303834/
https://www.ncbi.nlm.nih.gov/pubmed/32496200
http://dx.doi.org/10.2196/15073
_version_ 1783548144819109888
author Her, Qoua
Malenfant, Jessica
Zhang, Zilu
Vilk, Yury
Young, Jessica
Tabano, David
Hamilton, Jack
Johnson, Ron
Raebel, Marsha
Boudreau, Denise
Toh, Sengwee
author_facet Her, Qoua
Malenfant, Jessica
Zhang, Zilu
Vilk, Yury
Young, Jessica
Tabano, David
Hamilton, Jack
Johnson, Ron
Raebel, Marsha
Boudreau, Denise
Toh, Sengwee
author_sort Her, Qoua
collection PubMed
description BACKGROUND: A distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limited. OBJECTIVE: This study aimed to assess the precision and operational performance of a DRA application comprising a SAS-based DRA package and a file transfer workflow developed within the open-source distributed networking software PopMedNet in a horizontally partitioned distributed data network. METHODS: We executed the SAS-based DRA package to perform distributed linear, logistic, and Cox proportional hazards regression analysis on a real-world test case with 3 data partners. We used PopMedNet to iteratively and automatically transfer highly summarized information between the data partners and the analysis center. We compared the DRA results with the results from standard SAS procedures executed on the pooled individual-level dataset to evaluate the precision of the SAS-based DRA package. We computed the execution time of each step in the workflow to evaluate the operational performance of the PopMedNet-driven file transfer workflow. RESULTS: All DRA results were precise (<10(−12)), and DRA model fit curves were identical or similar to those obtained from the corresponding pooled individual-level data analyses. All regression models required less than 20 min for full end-to-end execution. CONCLUSIONS: We integrated a SAS-based DRA package with PopMedNet and successfully tested the new capability within an active distributed data network. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies.
format Online
Article
Text
id pubmed-7303834
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-73038342020-06-24 Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance Her, Qoua Malenfant, Jessica Zhang, Zilu Vilk, Yury Young, Jessica Tabano, David Hamilton, Jack Johnson, Ron Raebel, Marsha Boudreau, Denise Toh, Sengwee JMIR Med Inform Original Paper BACKGROUND: A distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limited. OBJECTIVE: This study aimed to assess the precision and operational performance of a DRA application comprising a SAS-based DRA package and a file transfer workflow developed within the open-source distributed networking software PopMedNet in a horizontally partitioned distributed data network. METHODS: We executed the SAS-based DRA package to perform distributed linear, logistic, and Cox proportional hazards regression analysis on a real-world test case with 3 data partners. We used PopMedNet to iteratively and automatically transfer highly summarized information between the data partners and the analysis center. We compared the DRA results with the results from standard SAS procedures executed on the pooled individual-level dataset to evaluate the precision of the SAS-based DRA package. We computed the execution time of each step in the workflow to evaluate the operational performance of the PopMedNet-driven file transfer workflow. RESULTS: All DRA results were precise (<10(−12)), and DRA model fit curves were identical or similar to those obtained from the corresponding pooled individual-level data analyses. All regression models required less than 20 min for full end-to-end execution. CONCLUSIONS: We integrated a SAS-based DRA package with PopMedNet and successfully tested the new capability within an active distributed data network. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies. JMIR Publications 2020-06-04 /pmc/articles/PMC7303834/ /pubmed/32496200 http://dx.doi.org/10.2196/15073 Text en ©Qoua Her, Jessica Malenfant, Zilu Zhang, Yury Vilk, Jessica Young, David Tabano, Jack Hamilton, Ron Johnson, Marsha Raebel, Denise Boudreau, Sengwee Toh. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.06.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Her, Qoua
Malenfant, Jessica
Zhang, Zilu
Vilk, Yury
Young, Jessica
Tabano, David
Hamilton, Jack
Johnson, Ron
Raebel, Marsha
Boudreau, Denise
Toh, Sengwee
Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance
title Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance
title_full Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance
title_fullStr Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance
title_full_unstemmed Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance
title_short Distributed Regression Analysis Application in Large Distributed Data Networks: Analysis of Precision and Operational Performance
title_sort distributed regression analysis application in large distributed data networks: analysis of precision and operational performance
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303834/
https://www.ncbi.nlm.nih.gov/pubmed/32496200
http://dx.doi.org/10.2196/15073
work_keys_str_mv AT herqoua distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT malenfantjessica distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT zhangzilu distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT vilkyury distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT youngjessica distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT tabanodavid distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT hamiltonjack distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT johnsonron distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT raebelmarsha distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT boudreaudenise distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance
AT tohsengwee distributedregressionanalysisapplicationinlargedistributeddatanetworksanalysisofprecisionandoperationalperformance