Cargando…
Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data
The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551205/ https://www.ncbi.nlm.nih.gov/pubmed/34707226 http://dx.doi.org/10.1038/s41746-021-00519-z |
_version_ | 1784591106284978176 |
---|---|
author | Hong, Chuan Rush, Everett Liu, Molei Zhou, Doudou Sun, Jiehuan Sonabend, Aaron Castro, Victor M. Schubert, Petra Panickan, Vidul A. Cai, Tianrun Costa, Lauren He, Zeling Link, Nicholas Hauser, Ronald Gaziano, J. Michael Murphy, Shawn N. Ostrouchov, George Ho, Yuk-Lam Begoli, Edmon Lu, Junwei Cho, Kelly Liao, Katherine P. Cai, Tianxi |
author_facet | Hong, Chuan Rush, Everett Liu, Molei Zhou, Doudou Sun, Jiehuan Sonabend, Aaron Castro, Victor M. Schubert, Petra Panickan, Vidul A. Cai, Tianrun Costa, Lauren He, Zeling Link, Nicholas Hauser, Ronald Gaziano, J. Michael Murphy, Shawn N. Ostrouchov, George Ho, Yuk-Lam Begoli, Edmon Lu, Junwei Cho, Kelly Liao, Katherine P. Cai, Tianxi |
author_sort | Hong, Chuan |
collection | PubMed |
description | The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data. |
format | Online Article Text |
id | pubmed-8551205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-85512052021-10-29 Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data Hong, Chuan Rush, Everett Liu, Molei Zhou, Doudou Sun, Jiehuan Sonabend, Aaron Castro, Victor M. Schubert, Petra Panickan, Vidul A. Cai, Tianrun Costa, Lauren He, Zeling Link, Nicholas Hauser, Ronald Gaziano, J. Michael Murphy, Shawn N. Ostrouchov, George Ho, Yuk-Lam Begoli, Edmon Lu, Junwei Cho, Kelly Liao, Katherine P. Cai, Tianxi NPJ Digit Med Article The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data. Nature Publishing Group UK 2021-10-27 /pmc/articles/PMC8551205/ /pubmed/34707226 http://dx.doi.org/10.1038/s41746-021-00519-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Hong, Chuan Rush, Everett Liu, Molei Zhou, Doudou Sun, Jiehuan Sonabend, Aaron Castro, Victor M. Schubert, Petra Panickan, Vidul A. Cai, Tianrun Costa, Lauren He, Zeling Link, Nicholas Hauser, Ronald Gaziano, J. Michael Murphy, Shawn N. Ostrouchov, George Ho, Yuk-Lam Begoli, Edmon Lu, Junwei Cho, Kelly Liao, Katherine P. Cai, Tianxi Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data |
title | Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data |
title_full | Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data |
title_fullStr | Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data |
title_full_unstemmed | Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data |
title_short | Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data |
title_sort | clinical knowledge extraction via sparse embedding regression (keser) with multi-center large scale electronic health record data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551205/ https://www.ncbi.nlm.nih.gov/pubmed/34707226 http://dx.doi.org/10.1038/s41746-021-00519-z |
work_keys_str_mv | AT hongchuan clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT rusheverett clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT liumolei clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT zhoudoudou clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT sunjiehuan clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT sonabendaaron clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT castrovictorm clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT schubertpetra clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT panickanvidula clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT caitianrun clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT costalauren clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT hezeling clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT linknicholas clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT hauserronald clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT gazianojmichael clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT murphyshawnn clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT ostrouchovgeorge clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT hoyuklam clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT begoliedmon clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT lujunwei clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT chokelly clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT liaokatherinep clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT caitianxi clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata AT clinicalknowledgeextractionviasparseembeddingregressionkeserwithmulticenterlargescaleelectronichealthrecorddata |