Cargando…
Generating contextual embeddings for emergency department chief complaints
OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AN...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382638/ https://www.ncbi.nlm.nih.gov/pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022 |
_version_ | 1783563284363870208 |
---|---|
author | Chang, David Hong, Woo Suk Taylor, Richard Andrew |
author_facet | Chang, David Hong, Woo Suk Taylor, Richard Andrew |
author_sort | Chang, David |
collection | PubMed |
description | OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AND METHODS: Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). RESULTS: The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. DISCUSSION: Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. CONCLUSION: Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions. |
format | Online Article Text |
id | pubmed-7382638 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73826382020-07-29 Generating contextual embeddings for emergency department chief complaints Chang, David Hong, Woo Suk Taylor, Richard Andrew JAMIA Open Brief Communications OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AND METHODS: Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). RESULTS: The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. DISCUSSION: Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. CONCLUSION: Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions. Oxford University Press 2020-07-15 /pmc/articles/PMC7382638/ /pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Brief Communications Chang, David Hong, Woo Suk Taylor, Richard Andrew Generating contextual embeddings for emergency department chief complaints |
title | Generating contextual embeddings for emergency department chief complaints |
title_full | Generating contextual embeddings for emergency department chief complaints |
title_fullStr | Generating contextual embeddings for emergency department chief complaints |
title_full_unstemmed | Generating contextual embeddings for emergency department chief complaints |
title_short | Generating contextual embeddings for emergency department chief complaints |
title_sort | generating contextual embeddings for emergency department chief complaints |
topic | Brief Communications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382638/ https://www.ncbi.nlm.nih.gov/pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022 |
work_keys_str_mv | AT changdavid generatingcontextualembeddingsforemergencydepartmentchiefcomplaints AT hongwoosuk generatingcontextualembeddingsforemergencydepartmentchiefcomplaints AT taylorrichardandrew generatingcontextualembeddingsforemergencydepartmentchiefcomplaints |