Cargando…

Using word embeddings to expand terminology of dietary supplements on clinical notes

OBJECTIVE: The objective of this study is to demonstrate the feasibility of applying word embeddings to expand the terminology of dietary supplements (DS) using over 26 million clinical notes. METHODS: Word embedding models (ie, word2vec and GloVe) trained on clinical notes were used to predefine a...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Yadan, Pakhomov, Serguei, McEwan, Reed, Zhao, Wendi, Lindemann, Elizabeth, Zhang, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6904105/
https://www.ncbi.nlm.nih.gov/pubmed/31825016
http://dx.doi.org/10.1093/jamiaopen/ooz007
_version_ 1783477958317441024
author Fan, Yadan
Pakhomov, Serguei
McEwan, Reed
Zhao, Wendi
Lindemann, Elizabeth
Zhang, Rui
author_facet Fan, Yadan
Pakhomov, Serguei
McEwan, Reed
Zhao, Wendi
Lindemann, Elizabeth
Zhang, Rui
author_sort Fan, Yadan
collection PubMed
description OBJECTIVE: The objective of this study is to demonstrate the feasibility of applying word embeddings to expand the terminology of dietary supplements (DS) using over 26 million clinical notes. METHODS: Word embedding models (ie, word2vec and GloVe) trained on clinical notes were used to predefine a list of top 40 semantically related terms for each of 14 commonly used DS. Each list was further evaluated by experts to generate semantically similar terms. We investigated the effect of corpus size and other settings (ie, vector size and window size) as well as the 2 word embedding models on performance for DS term expansion. We compared the number of clinical notes (and patients they represent) that were retrieved using the word embedding expanded terms to both the baseline terms and external DS sources expanded terms. RESULTS: Using the word embedding models trained on clinical notes, we could identify 1–12 semantically similar terms for each DS. Using the word embedding expanded terms, we were able to retrieve averagely 8.39% more clinical notes and 11.68% more patients for each DS compared with 2 sets of terms. The increasing corpus size results in more misspellings, but not more semantic variants and brand names. Word2vec model is also found more capable of detecting semantically similar terms than GloVe. CONCLUSION: Our study demonstrates the utility of word embeddings on clinical notes for terminology expansion on 14 DS. We propose that this method can be potentially applied to create a DS vocabulary for downstream applications, such as information extraction.
format Online
Article
Text
id pubmed-6904105
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69041052019-12-10 Using word embeddings to expand terminology of dietary supplements on clinical notes Fan, Yadan Pakhomov, Serguei McEwan, Reed Zhao, Wendi Lindemann, Elizabeth Zhang, Rui JAMIA Open Research and Applications OBJECTIVE: The objective of this study is to demonstrate the feasibility of applying word embeddings to expand the terminology of dietary supplements (DS) using over 26 million clinical notes. METHODS: Word embedding models (ie, word2vec and GloVe) trained on clinical notes were used to predefine a list of top 40 semantically related terms for each of 14 commonly used DS. Each list was further evaluated by experts to generate semantically similar terms. We investigated the effect of corpus size and other settings (ie, vector size and window size) as well as the 2 word embedding models on performance for DS term expansion. We compared the number of clinical notes (and patients they represent) that were retrieved using the word embedding expanded terms to both the baseline terms and external DS sources expanded terms. RESULTS: Using the word embedding models trained on clinical notes, we could identify 1–12 semantically similar terms for each DS. Using the word embedding expanded terms, we were able to retrieve averagely 8.39% more clinical notes and 11.68% more patients for each DS compared with 2 sets of terms. The increasing corpus size results in more misspellings, but not more semantic variants and brand names. Word2vec model is also found more capable of detecting semantically similar terms than GloVe. CONCLUSION: Our study demonstrates the utility of word embeddings on clinical notes for terminology expansion on 14 DS. We propose that this method can be potentially applied to create a DS vocabulary for downstream applications, such as information extraction. Oxford University Press 2019-03-28 /pmc/articles/PMC6904105/ /pubmed/31825016 http://dx.doi.org/10.1093/jamiaopen/ooz007 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Fan, Yadan
Pakhomov, Serguei
McEwan, Reed
Zhao, Wendi
Lindemann, Elizabeth
Zhang, Rui
Using word embeddings to expand terminology of dietary supplements on clinical notes
title Using word embeddings to expand terminology of dietary supplements on clinical notes
title_full Using word embeddings to expand terminology of dietary supplements on clinical notes
title_fullStr Using word embeddings to expand terminology of dietary supplements on clinical notes
title_full_unstemmed Using word embeddings to expand terminology of dietary supplements on clinical notes
title_short Using word embeddings to expand terminology of dietary supplements on clinical notes
title_sort using word embeddings to expand terminology of dietary supplements on clinical notes
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6904105/
https://www.ncbi.nlm.nih.gov/pubmed/31825016
http://dx.doi.org/10.1093/jamiaopen/ooz007
work_keys_str_mv AT fanyadan usingwordembeddingstoexpandterminologyofdietarysupplementsonclinicalnotes
AT pakhomovserguei usingwordembeddingstoexpandterminologyofdietarysupplementsonclinicalnotes
AT mcewanreed usingwordembeddingstoexpandterminologyofdietarysupplementsonclinicalnotes
AT zhaowendi usingwordembeddingstoexpandterminologyofdietarysupplementsonclinicalnotes
AT lindemannelizabeth usingwordembeddingstoexpandterminologyofdietarysupplementsonclinicalnotes
AT zhangrui usingwordembeddingstoexpandterminologyofdietarysupplementsonclinicalnotes