Cargando…

Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

BACKGROUND: Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to suppor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Huang, Yingxiang, Lee, Junghye, Wang, Shuang, Sun, Jimeng, Liu, Hongfang, Jiang, Xiaoqian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2018
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5981054/ https://www.ncbi.nlm.nih.gov/pubmed/29769172 http://dx.doi.org/10.2196/medinform.9455

_version_	1783327968908541952
author	Huang, Yingxiang Lee, Junghye Wang, Shuang Sun, Jimeng Liu, Hongfang Jiang, Xiaoqian
author_facet	Huang, Yingxiang Lee, Junghye Wang, Shuang Sun, Jimeng Liu, Hongfang Jiang, Xiaoqian
author_sort	Huang, Yingxiang
collection	PubMed
description	BACKGROUND: Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless. OBJECTIVE: The aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources. METHODS: We propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings. RESULTS: We used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling. CONCLUSIONS: Such aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care.
format	Online Article Text
id	pubmed-5981054
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-59810542018-06-01 Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources Huang, Yingxiang Lee, Junghye Wang, Shuang Sun, Jimeng Liu, Hongfang Jiang, Xiaoqian JMIR Med Inform Original Paper BACKGROUND: Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless. OBJECTIVE: The aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources. METHODS: We propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings. RESULTS: We used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling. CONCLUSIONS: Such aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care. JMIR Publications 2018-05-16 /pmc/articles/PMC5981054/ /pubmed/29769172 http://dx.doi.org/10.2196/medinform.9455 Text en ©Yingxiang Huang, Junghye Lee, Shuang Wang, Jimeng Sun, Hongfang Liu, Xiaoqian Jiang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 16.05.2018. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Huang, Yingxiang Lee, Junghye Wang, Shuang Sun, Jimeng Liu, Hongfang Jiang, Xiaoqian Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources
title	Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources
title_full	Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources
title_fullStr	Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources
title_full_unstemmed	Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources
title_short	Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources
title_sort	privacy-preserving predictive modeling: harmonization of contextual embeddings from different sources
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5981054/ https://www.ncbi.nlm.nih.gov/pubmed/29769172 http://dx.doi.org/10.2196/medinform.9455
work_keys_str_mv	AT huangyingxiang privacypreservingpredictivemodelingharmonizationofcontextualembeddingsfromdifferentsources AT leejunghye privacypreservingpredictivemodelingharmonizationofcontextualembeddingsfromdifferentsources AT wangshuang privacypreservingpredictivemodelingharmonizationofcontextualembeddingsfromdifferentsources AT sunjimeng privacypreservingpredictivemodelingharmonizationofcontextualembeddingsfromdifferentsources AT liuhongfang privacypreservingpredictivemodelingharmonizationofcontextualembeddingsfromdifferentsources AT jiangxiaoqian privacypreservingpredictivemodelingharmonizationofcontextualembeddingsfromdifferentsources

Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

Ejemplares similares