Cargando…

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

BACKGROUND: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kumar, Sajit, Nanelia, Alicia, Mariappan, Ragunathan, Rajagopal, Adithya, Rajan, Vaibhav
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814927/ https://www.ncbi.nlm.nih.gov/pubmed/35049514 http://dx.doi.org/10.2196/28842

_version_	1784645178066206720
author	Kumar, Sajit Nanelia, Alicia Mariappan, Ragunathan Rajagopal, Adithya Rajan, Vaibhav
author_facet	Kumar, Sajit Nanelia, Alicia Mariappan, Ragunathan Rajagopal, Adithya Rajan, Vaibhav
author_sort	Kumar, Sajit
collection	PubMed
description	BACKGROUND: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network–based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. OBJECTIVE: This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. METHODS: Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. RESULTS: Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. CONCLUSIONS: Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.
format	Online Article Text
id	pubmed-8814927
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-88149272022-02-08 Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study Kumar, Sajit Nanelia, Alicia Mariappan, Ragunathan Rajagopal, Adithya Rajan, Vaibhav JMIR Med Inform Original Paper BACKGROUND: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network–based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. OBJECTIVE: This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. METHODS: Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. RESULTS: Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. CONCLUSIONS: Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations. JMIR Publications 2022-01-20 /pmc/articles/PMC8814927/ /pubmed/35049514 http://dx.doi.org/10.2196/28842 Text en ©Sajit Kumar, Alicia Nanelia, Ragunathan Mariappan, Adithya Rajagopal, Vaibhav Rajan. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 20.01.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Kumar, Sajit Nanelia, Alicia Mariappan, Ragunathan Rajagopal, Adithya Rajan, Vaibhav Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study
title	Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study
title_full	Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study
title_fullStr	Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study
title_full_unstemmed	Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study
title_short	Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study
title_sort	patient representation learning from heterogeneous data sources and knowledge graphs using deep collective matrix factorization: evaluation study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8814927/ https://www.ncbi.nlm.nih.gov/pubmed/35049514 http://dx.doi.org/10.2196/28842
work_keys_str_mv	AT kumarsajit patientrepresentationlearningfromheterogeneousdatasourcesandknowledgegraphsusingdeepcollectivematrixfactorizationevaluationstudy AT naneliaalicia patientrepresentationlearningfromheterogeneousdatasourcesandknowledgegraphsusingdeepcollectivematrixfactorizationevaluationstudy AT mariappanragunathan patientrepresentationlearningfromheterogeneousdatasourcesandknowledgegraphsusingdeepcollectivematrixfactorizationevaluationstudy AT rajagopaladithya patientrepresentationlearningfromheterogeneousdatasourcesandknowledgegraphsusingdeepcollectivematrixfactorizationevaluationstudy AT rajanvaibhav patientrepresentationlearningfromheterogeneousdatasourcesandknowledgegraphsusingdeepcollectivematrixfactorizationevaluationstudy

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

Ejemplares similares