Cargando…

Distributed Tensor Decomposition for Large Scale Health Analytics

In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algo...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Huan, Henderson, Jette, Ho, Joyce C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6563812/
https://www.ncbi.nlm.nih.gov/pubmed/31198910
http://dx.doi.org/10.1145/3308558.3313548
_version_ 1783426615975346176
author He, Huan
Henderson, Jette
Ho, Joyce C.
author_facet He, Huan
Henderson, Jette
Ho, Joyce C.
author_sort He, Huan
collection PubMed
description In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite’s capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.
format Online
Article
Text
id pubmed-6563812
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-65638122019-06-13 Distributed Tensor Decomposition for Large Scale Health Analytics He, Huan Henderson, Jette Ho, Joyce C. Proc Int World Wide Web Conf Article In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite’s capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population. 2019-05 /pmc/articles/PMC6563812/ /pubmed/31198910 http://dx.doi.org/10.1145/3308558.3313548 Text en http://creativecommons.org/licenses/by/4.0/ This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.
spellingShingle Article
He, Huan
Henderson, Jette
Ho, Joyce C.
Distributed Tensor Decomposition for Large Scale Health Analytics
title Distributed Tensor Decomposition for Large Scale Health Analytics
title_full Distributed Tensor Decomposition for Large Scale Health Analytics
title_fullStr Distributed Tensor Decomposition for Large Scale Health Analytics
title_full_unstemmed Distributed Tensor Decomposition for Large Scale Health Analytics
title_short Distributed Tensor Decomposition for Large Scale Health Analytics
title_sort distributed tensor decomposition for large scale health analytics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6563812/
https://www.ncbi.nlm.nih.gov/pubmed/31198910
http://dx.doi.org/10.1145/3308558.3313548
work_keys_str_mv AT hehuan distributedtensordecompositionforlargescalehealthanalytics
AT hendersonjette distributedtensordecompositionforlargescalehealthanalytics
AT hojoycec distributedtensordecompositionforlargescalehealthanalytics