Cargando…

Time-sensitive clinical concept embeddings learned from large electronic health records

BACKGROUND: Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiang, Yang, Xu, Jun, Si, Yuqi, Li, Zhiheng, Rasmy, Laila, Zhou, Yujia, Tiryaki, Firat, Li, Fang, Zhang, Yaoyun, Wu, Yonghui, Jiang, Xiaoqian, Zheng, Wenjin Jim, Zhi, Degui, Tao, Cui, Xu, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454598/
https://www.ncbi.nlm.nih.gov/pubmed/30961579
http://dx.doi.org/10.1186/s12911-019-0766-3
_version_ 1783409567882805248
author Xiang, Yang
Xu, Jun
Si, Yuqi
Li, Zhiheng
Rasmy, Laila
Zhou, Yujia
Tiryaki, Firat
Li, Fang
Zhang, Yaoyun
Wu, Yonghui
Jiang, Xiaoqian
Zheng, Wenjin Jim
Zhi, Degui
Tao, Cui
Xu, Hua
author_facet Xiang, Yang
Xu, Jun
Si, Yuqi
Li, Zhiheng
Rasmy, Laila
Zhou, Yujia
Tiryaki, Firat
Li, Fang
Zhang, Yaoyun
Wu, Yonghui
Jiang, Xiaoqian
Zheng, Wenjin Jim
Zhi, Degui
Tao, Cui
Xu, Hua
author_sort Xiang, Yang
collection PubMed
description BACKGROUND: Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. METHODS: To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. RESULTS: Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. CONCLUSIONS: Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.
format Online
Article
Text
id pubmed-6454598
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64545982019-04-19 Time-sensitive clinical concept embeddings learned from large electronic health records Xiang, Yang Xu, Jun Si, Yuqi Li, Zhiheng Rasmy, Laila Zhou, Yujia Tiryaki, Firat Li, Fang Zhang, Yaoyun Wu, Yonghui Jiang, Xiaoqian Zheng, Wenjin Jim Zhi, Degui Tao, Cui Xu, Hua BMC Med Inform Decis Mak Research BACKGROUND: Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. METHODS: To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. RESULTS: Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. CONCLUSIONS: Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications. BioMed Central 2019-04-09 /pmc/articles/PMC6454598/ /pubmed/30961579 http://dx.doi.org/10.1186/s12911-019-0766-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Xiang, Yang
Xu, Jun
Si, Yuqi
Li, Zhiheng
Rasmy, Laila
Zhou, Yujia
Tiryaki, Firat
Li, Fang
Zhang, Yaoyun
Wu, Yonghui
Jiang, Xiaoqian
Zheng, Wenjin Jim
Zhi, Degui
Tao, Cui
Xu, Hua
Time-sensitive clinical concept embeddings learned from large electronic health records
title Time-sensitive clinical concept embeddings learned from large electronic health records
title_full Time-sensitive clinical concept embeddings learned from large electronic health records
title_fullStr Time-sensitive clinical concept embeddings learned from large electronic health records
title_full_unstemmed Time-sensitive clinical concept embeddings learned from large electronic health records
title_short Time-sensitive clinical concept embeddings learned from large electronic health records
title_sort time-sensitive clinical concept embeddings learned from large electronic health records
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454598/
https://www.ncbi.nlm.nih.gov/pubmed/30961579
http://dx.doi.org/10.1186/s12911-019-0766-3
work_keys_str_mv AT xiangyang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT xujun timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT siyuqi timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT lizhiheng timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT rasmylaila timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT zhouyujia timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT tiryakifirat timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT lifang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT zhangyaoyun timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT wuyonghui timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT jiangxiaoqian timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT zhengwenjinjim timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT zhidegui timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT taocui timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT xuhua timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords