Cargando…

Learning unsupervised contextual representations for medical synonym discovery

OBJECTIVES: An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (“dilated RA” a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schumacher, Elliot, Dredze, Mark
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994012/ https://www.ncbi.nlm.nih.gov/pubmed/32025651 http://dx.doi.org/10.1093/jamiaopen/ooz057

_version_	1783493135709503488
author	Schumacher, Elliot Dredze, Mark
author_facet	Schumacher, Elliot Dredze, Mark
author_sort	Schumacher, Elliot
collection	PubMed
description	OBJECTIVES: An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (“dilated RA” and “dilated RV”) or dissimilar (“cerebrovascular accident” and “stroke”); contextual information can determine if 2 strings are synonymous. Medical professionals utilize extensive variation of medical terminology, often not evidenced in structured medical resources. Therefore, the ability to discover synonyms, especially without reliance on training data, is an important component in processing training notes. The ability to discover synonyms from models trained on large amounts of unannotated data removes the need to rely on annotated pairs of similar words. Models relying solely on non-annotated data can be trained on a wider variety of texts without the cost of annotation, and thus may capture a broader variety of language. MATERIALS AND METHODS: Recent contextualized deep learning representation models, such as ELMo (Peters et al., 2019) and BERT, (Devlin et al. 2019) have shown strong improvements over previous approaches in a broad variety of tasks. We leverage these contextualized deep learning models to build representations of synonyms, which integrate the context of surrounding sentence and use character-level models to alleviate out-of-vocabulary issues. Using these models, we perform unsupervised discovery of likely synonym matches, which reduces the reliance on expensive training data. RESULTS: We use the ShARe/CLEF eHealth Evaluation Lab 2013 Task 1b data to evaluate our synonym discovery method. Comparing our proposed contextualized deep learning representations to previous non-neural representations, we find that the contextualized representations show consistent improvement over non-contextualized models in all metrics. CONCLUSIONS: Our results show that contextualized models produce effective representations for synonym discovery. We expect that the use of these representations in other tasks would produce similar gains in performance.
format	Online Article Text
id	pubmed-6994012
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-69940122020-02-05 Learning unsupervised contextual representations for medical synonym discovery Schumacher, Elliot Dredze, Mark JAMIA Open Research and Applications OBJECTIVES: An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (“dilated RA” and “dilated RV”) or dissimilar (“cerebrovascular accident” and “stroke”); contextual information can determine if 2 strings are synonymous. Medical professionals utilize extensive variation of medical terminology, often not evidenced in structured medical resources. Therefore, the ability to discover synonyms, especially without reliance on training data, is an important component in processing training notes. The ability to discover synonyms from models trained on large amounts of unannotated data removes the need to rely on annotated pairs of similar words. Models relying solely on non-annotated data can be trained on a wider variety of texts without the cost of annotation, and thus may capture a broader variety of language. MATERIALS AND METHODS: Recent contextualized deep learning representation models, such as ELMo (Peters et al., 2019) and BERT, (Devlin et al. 2019) have shown strong improvements over previous approaches in a broad variety of tasks. We leverage these contextualized deep learning models to build representations of synonyms, which integrate the context of surrounding sentence and use character-level models to alleviate out-of-vocabulary issues. Using these models, we perform unsupervised discovery of likely synonym matches, which reduces the reliance on expensive training data. RESULTS: We use the ShARe/CLEF eHealth Evaluation Lab 2013 Task 1b data to evaluate our synonym discovery method. Comparing our proposed contextualized deep learning representations to previous non-neural representations, we find that the contextualized representations show consistent improvement over non-contextualized models in all metrics. CONCLUSIONS: Our results show that contextualized models produce effective representations for synonym discovery. We expect that the use of these representations in other tasks would produce similar gains in performance. Oxford University Press 2019-11-04 /pmc/articles/PMC6994012/ /pubmed/32025651 http://dx.doi.org/10.1093/jamiaopen/ooz057 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research and Applications Schumacher, Elliot Dredze, Mark Learning unsupervised contextual representations for medical synonym discovery
title	Learning unsupervised contextual representations for medical synonym discovery
title_full	Learning unsupervised contextual representations for medical synonym discovery
title_fullStr	Learning unsupervised contextual representations for medical synonym discovery
title_full_unstemmed	Learning unsupervised contextual representations for medical synonym discovery
title_short	Learning unsupervised contextual representations for medical synonym discovery
title_sort	learning unsupervised contextual representations for medical synonym discovery
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994012/ https://www.ncbi.nlm.nih.gov/pubmed/32025651 http://dx.doi.org/10.1093/jamiaopen/ooz057
work_keys_str_mv	AT schumacherelliot learningunsupervisedcontextualrepresentationsformedicalsynonymdiscovery AT dredzemark learningunsupervisedcontextualrepresentationsformedicalsynonymdiscovery

Learning unsupervised contextual representations for medical synonym discovery

Ejemplares similares