Cargando…

Grid Binary LOgistic REgression (GLORE): building shared models without sharing data

OBJECTIVE: The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models s...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Yuan, Jiang, Xiaoqian, Kim, Jihoon, Ohno-Machado, Lucila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Group 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422844/
https://www.ncbi.nlm.nih.gov/pubmed/22511014
http://dx.doi.org/10.1136/amiajnl-2012-000862
_version_ 1782241061705351168
author Wu, Yuan
Jiang, Xiaoqian
Kim, Jihoon
Ohno-Machado, Lucila
author_facet Wu, Yuan
Jiang, Xiaoqian
Kim, Jihoon
Ohno-Machado, Lucila
author_sort Wu, Yuan
collection PubMed
description OBJECTIVE: The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models such as binary logistic regression (LR) can be developed in a distributed manner, allowing researchers to share models without necessarily sharing patient data. MATERIAL AND METHODS: Instead of bringing data to a central repository for computation, we bring computation to the data. The Grid Binary LOgistic REgression (GLORE) model integrates decomposable partial elements or non-privacy sensitive prediction values to obtain model coefficients, the variance-covariance matrix, the goodness-of-fit test statistic, and the area under the receiver operating characteristic (ROC) curve. RESULTS: We conducted experiments on both simulated and clinically relevant data, and compared the computational costs of GLORE with those of a traditional LR model estimated using the combined data. We showed that our results are the same as those of LR to a 10(−15) precision. In addition, GLORE is computationally efficient. LIMITATION: In GLORE, the calculation of coefficient gradients must be synchronized at different sites, which involves some effort to ensure the integrity of communication. Ensuring that the predictors have the same format and meaning across the data sets is necessary. CONCLUSION: The results suggest that GLORE performs as well as LR and allows data to remain protected at their original sites.
format Online
Article
Text
id pubmed-3422844
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BMJ Group
record_format MEDLINE/PubMed
spelling pubmed-34228442012-08-20 Grid Binary LOgistic REgression (GLORE): building shared models without sharing data Wu, Yuan Jiang, Xiaoqian Kim, Jihoon Ohno-Machado, Lucila J Am Med Inform Assoc Research and Applications OBJECTIVE: The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models such as binary logistic regression (LR) can be developed in a distributed manner, allowing researchers to share models without necessarily sharing patient data. MATERIAL AND METHODS: Instead of bringing data to a central repository for computation, we bring computation to the data. The Grid Binary LOgistic REgression (GLORE) model integrates decomposable partial elements or non-privacy sensitive prediction values to obtain model coefficients, the variance-covariance matrix, the goodness-of-fit test statistic, and the area under the receiver operating characteristic (ROC) curve. RESULTS: We conducted experiments on both simulated and clinically relevant data, and compared the computational costs of GLORE with those of a traditional LR model estimated using the combined data. We showed that our results are the same as those of LR to a 10(−15) precision. In addition, GLORE is computationally efficient. LIMITATION: In GLORE, the calculation of coefficient gradients must be synchronized at different sites, which involves some effort to ensure the integrity of communication. Ensuring that the predictors have the same format and meaning across the data sets is necessary. CONCLUSION: The results suggest that GLORE performs as well as LR and allows data to remain protected at their original sites. BMJ Group 2012-04-17 2012 /pmc/articles/PMC3422844/ /pubmed/22511014 http://dx.doi.org/10.1136/amiajnl-2012-000862 Text en © 2012, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
spellingShingle Research and Applications
Wu, Yuan
Jiang, Xiaoqian
Kim, Jihoon
Ohno-Machado, Lucila
Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
title Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
title_full Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
title_fullStr Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
title_full_unstemmed Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
title_short Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
title_sort grid binary logistic regression (glore): building shared models without sharing data
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422844/
https://www.ncbi.nlm.nih.gov/pubmed/22511014
http://dx.doi.org/10.1136/amiajnl-2012-000862
work_keys_str_mv AT wuyuan gridbinarylogisticregressionglorebuildingsharedmodelswithoutsharingdata
AT jiangxiaoqian gridbinarylogisticregressionglorebuildingsharedmodelswithoutsharingdata
AT kimjihoon gridbinarylogisticregressionglorebuildingsharedmodelswithoutsharingdata
AT ohnomachadolucila gridbinarylogisticregressionglorebuildingsharedmodelswithoutsharingdata