Cargando…

Improving medical term embeddings using UMLS Metathesaurus

BACKGROUND: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine...

Descripción completa

Detalles Bibliográficos
Autores principales: Chanda, Ashis Kumar, Bai, Tian, Yang, Ziyu, Vucetic, Slobodan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052653/
https://www.ncbi.nlm.nih.gov/pubmed/35488252
http://dx.doi.org/10.1186/s12911-022-01850-5
_version_ 1784696827697692672
author Chanda, Ashis Kumar
Bai, Tian
Yang, Ziyu
Vucetic, Slobodan
author_facet Chanda, Ashis Kumar
Bai, Tian
Yang, Ziyu
Vucetic, Slobodan
author_sort Chanda, Ashis Kumar
collection PubMed
description BACKGROUND: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small. METHODS: In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus. RESULTS: To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications. CONCLUSION: This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.
format Online
Article
Text
id pubmed-9052653
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90526532022-04-30 Improving medical term embeddings using UMLS Metathesaurus Chanda, Ashis Kumar Bai, Tian Yang, Ziyu Vucetic, Slobodan BMC Med Inform Decis Mak Research Article BACKGROUND: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small. METHODS: In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus. RESULTS: To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications. CONCLUSION: This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes. BioMed Central 2022-04-29 /pmc/articles/PMC9052653/ /pubmed/35488252 http://dx.doi.org/10.1186/s12911-022-01850-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Chanda, Ashis Kumar
Bai, Tian
Yang, Ziyu
Vucetic, Slobodan
Improving medical term embeddings using UMLS Metathesaurus
title Improving medical term embeddings using UMLS Metathesaurus
title_full Improving medical term embeddings using UMLS Metathesaurus
title_fullStr Improving medical term embeddings using UMLS Metathesaurus
title_full_unstemmed Improving medical term embeddings using UMLS Metathesaurus
title_short Improving medical term embeddings using UMLS Metathesaurus
title_sort improving medical term embeddings using umls metathesaurus
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052653/
https://www.ncbi.nlm.nih.gov/pubmed/35488252
http://dx.doi.org/10.1186/s12911-022-01850-5
work_keys_str_mv AT chandaashiskumar improvingmedicaltermembeddingsusingumlsmetathesaurus
AT baitian improvingmedicaltermembeddingsusingumlsmetathesaurus
AT yangziyu improvingmedicaltermembeddingsusingumlsmetathesaurus
AT vuceticslobodan improvingmedicaltermembeddingsusingumlsmetathesaurus