Cargando…

Generating contextual embeddings for emergency department chief complaints

OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AN...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, David, Hong, Woo Suk, Taylor, Richard Andrew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382638/
https://www.ncbi.nlm.nih.gov/pubmed/32734154
http://dx.doi.org/10.1093/jamiaopen/ooaa022
_version_ 1783563284363870208
author Chang, David
Hong, Woo Suk
Taylor, Richard Andrew
author_facet Chang, David
Hong, Woo Suk
Taylor, Richard Andrew
author_sort Chang, David
collection PubMed
description OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AND METHODS: Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). RESULTS: The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. DISCUSSION: Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. CONCLUSION: Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions.
format Online
Article
Text
id pubmed-7382638
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73826382020-07-29 Generating contextual embeddings for emergency department chief complaints Chang, David Hong, Woo Suk Taylor, Richard Andrew JAMIA Open Brief Communications OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AND METHODS: Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). RESULTS: The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. DISCUSSION: Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. CONCLUSION: Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions. Oxford University Press 2020-07-15 /pmc/articles/PMC7382638/ /pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Brief Communications
Chang, David
Hong, Woo Suk
Taylor, Richard Andrew
Generating contextual embeddings for emergency department chief complaints
title Generating contextual embeddings for emergency department chief complaints
title_full Generating contextual embeddings for emergency department chief complaints
title_fullStr Generating contextual embeddings for emergency department chief complaints
title_full_unstemmed Generating contextual embeddings for emergency department chief complaints
title_short Generating contextual embeddings for emergency department chief complaints
title_sort generating contextual embeddings for emergency department chief complaints
topic Brief Communications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382638/
https://www.ncbi.nlm.nih.gov/pubmed/32734154
http://dx.doi.org/10.1093/jamiaopen/ooaa022
work_keys_str_mv AT changdavid generatingcontextualembeddingsforemergencydepartmentchiefcomplaints
AT hongwoosuk generatingcontextualembeddingsforemergencydepartmentchiefcomplaints
AT taylorrichardandrew generatingcontextualembeddingsforemergencydepartmentchiefcomplaints