Cargando…

Generating contextual embeddings for emergency department chief complaints

OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AN...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chang, David, Hong, Woo Suk, Taylor, Richard Andrew
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Brief Communications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382638/ https://www.ncbi.nlm.nih.gov/pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022

_version_	1783563284363870208
author	Chang, David Hong, Woo Suk Taylor, Richard Andrew
author_facet	Chang, David Hong, Woo Suk Taylor, Richard Andrew
author_sort	Chang, David
collection	PubMed
description	OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AND METHODS: Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). RESULTS: The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. DISCUSSION: Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. CONCLUSION: Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions.
format	Online Article Text
id	pubmed-7382638
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73826382020-07-29 Generating contextual embeddings for emergency department chief complaints Chang, David Hong, Woo Suk Taylor, Richard Andrew JAMIA Open Brief Communications OBJECTIVE: We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints. MATERIALS AND METHODS: Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from k = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE). RESULTS: The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together. DISCUSSION: Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space. CONCLUSION: Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions. Oxford University Press 2020-07-15 /pmc/articles/PMC7382638/ /pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Brief Communications Chang, David Hong, Woo Suk Taylor, Richard Andrew Generating contextual embeddings for emergency department chief complaints
title	Generating contextual embeddings for emergency department chief complaints
title_full	Generating contextual embeddings for emergency department chief complaints
title_fullStr	Generating contextual embeddings for emergency department chief complaints
title_full_unstemmed	Generating contextual embeddings for emergency department chief complaints
title_short	Generating contextual embeddings for emergency department chief complaints
title_sort	generating contextual embeddings for emergency department chief complaints
topic	Brief Communications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382638/ https://www.ncbi.nlm.nih.gov/pubmed/32734154 http://dx.doi.org/10.1093/jamiaopen/ooaa022
work_keys_str_mv	AT changdavid generatingcontextualembeddingsforemergencydepartmentchiefcomplaints AT hongwoosuk generatingcontextualembeddingsforemergencydepartmentchiefcomplaints AT taylorrichardandrew generatingcontextualembeddingsforemergencydepartmentchiefcomplaints

Generating contextual embeddings for emergency department chief complaints

Ejemplares similares