Cargando…

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Chuan, Rush, Everett, Liu, Molei, Zhou, Doudou, Sun, Jiehuan, Sonabend, Aaron, Castro, Victor M., Schubert, Petra, Panickan, Vidul A., Cai, Tianrun, Costa, Lauren, He, Zeling, Link, Nicholas, Hauser, Ronald, Gaziano, J. Michael, Murphy, Shawn N., Ostrouchov, George, Ho, Yuk-Lam, Begoli, Edmon, Lu, Junwei, Cho, Kelly, Liao, Katherine P., Cai, Tianxi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551205/
https://www.ncbi.nlm.nih.gov/pubmed/34707226
http://dx.doi.org/10.1038/s41746-021-00519-z
_version_ 1784591106284978176
author Hong, Chuan
Rush, Everett
Liu, Molei
Zhou, Doudou
Sun, Jiehuan
Sonabend, Aaron
Castro, Victor M.
Schubert, Petra
Panickan, Vidul A.
Cai, Tianrun
Costa, Lauren
He, Zeling
Link, Nicholas
Hauser, Ronald
Gaziano, J. Michael
Murphy, Shawn N.
Ostrouchov, George
Ho, Yuk-Lam
Begoli, Edmon
Lu, Junwei
Cho, Kelly
Liao, Katherine P.
Cai, Tianxi
author_facet Hong, Chuan
Rush, Everett
Liu, Molei
Zhou, Doudou
Sun, Jiehuan
Sonabend, Aaron
Castro, Victor M.
Schubert, Petra
Panickan, Vidul A.
Cai, Tianrun
Costa, Lauren
He, Zeling
Link, Nicholas
Hauser, Ronald
Gaziano, J. Michael
Murphy, Shawn N.
Ostrouchov, George
Ho, Yuk-Lam
Begoli, Edmon
Lu, Junwei
Cho, Kelly
Liao, Katherine P.
Cai, Tianxi
author_sort Hong, Chuan
collection PubMed
description The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.
format Online
Article
Text
id pubmed-8551205
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-85512052021-10-29 Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data Hong, Chuan Rush, Everett Liu, Molei Zhou, Doudou Sun, Jiehuan Sonabend, Aaron Castro, Victor M. Schubert, Petra Panickan, Vidul A. Cai, Tianrun Costa, Lauren He, Zeling Link, Nicholas Hauser, Ronald Gaziano, J. Michael Murphy, Shawn N. Ostrouchov, George Ho, Yuk-Lam Begoli, Edmon Lu, Junwei Cho, Kelly Liao, Katherine P. Cai, Tianxi NPJ Digit Med Article The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data. Nature Publishing Group UK 2021-10-27 /pmc/articles/PMC8551205/ /pubmed/34707226 http://dx.doi.org/10.1038/s41746-021-00519-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Hong, Chuan
Rush, Everett
Liu, Molei
Zhou, Doudou
Sun, Jiehuan
Sonabend, Aaron
Castro, Victor M.
Schubert, Petra
Panickan, Vidul A.
Cai, Tianrun
Costa, Lauren
He, Zeling
Link, Nicholas
Hauser, Ronald
Gaziano, J. Michael
Murphy, Shawn N.
Ostrouchov, George
Ho, Yuk-Lam
Begoli, Edmon
Lu, Junwei
Cho, Kelly
Liao, Katherine P.
Cai, Tianxi
Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
title Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
title_full Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
title_fullStr Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
title_full_unstemmed Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
title_short Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
title_sort clinical knowledge extraction via sparse embedding regression (keser) with multi-center large scale electronic health record data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551205/
https://www.ncbi.nlm.nih.gov/pubmed/34707226
http://dx.doi.org/10.1038/s41746-021-00519-z
work_keys_str_mv AT hongchuan clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT rusheverett clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT liumolei clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT zhoudoudou clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT sunjiehuan clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT sonabendaaron clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT castrovictorm clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT schubertpetra clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT panickanvidula clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT caitianrun clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT costalauren clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT hezeling clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT linknicholas clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT hauserronald clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT gazianojmichael clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT murphyshawnn clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT ostrouchovgeorge clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT hoyuklam clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT begoliedmon clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT lujunwei clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT chokelly clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT liaokatherinep clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT caitianxi clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata
AT clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata