Cargando…
Learning unsupervised contextual representations for medical synonym discovery
OBJECTIVES: An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (“dilated RA” a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994012/ https://www.ncbi.nlm.nih.gov/pubmed/32025651 http://dx.doi.org/10.1093/jamiaopen/ooz057 |
_version_ | 1783493135709503488 |
---|---|
author | Schumacher, Elliot Dredze, Mark |
author_facet | Schumacher, Elliot Dredze, Mark |
author_sort | Schumacher, Elliot |
collection | PubMed |
description | OBJECTIVES: An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (“dilated RA” and “dilated RV”) or dissimilar (“cerebrovascular accident” and “stroke”); contextual information can determine if 2 strings are synonymous. Medical professionals utilize extensive variation of medical terminology, often not evidenced in structured medical resources. Therefore, the ability to discover synonyms, especially without reliance on training data, is an important component in processing training notes. The ability to discover synonyms from models trained on large amounts of unannotated data removes the need to rely on annotated pairs of similar words. Models relying solely on non-annotated data can be trained on a wider variety of texts without the cost of annotation, and thus may capture a broader variety of language. MATERIALS AND METHODS: Recent contextualized deep learning representation models, such as ELMo (Peters et al., 2019) and BERT, (Devlin et al. 2019) have shown strong improvements over previous approaches in a broad variety of tasks. We leverage these contextualized deep learning models to build representations of synonyms, which integrate the context of surrounding sentence and use character-level models to alleviate out-of-vocabulary issues. Using these models, we perform unsupervised discovery of likely synonym matches, which reduces the reliance on expensive training data. RESULTS: We use the ShARe/CLEF eHealth Evaluation Lab 2013 Task 1b data to evaluate our synonym discovery method. Comparing our proposed contextualized deep learning representations to previous non-neural representations, we find that the contextualized representations show consistent improvement over non-contextualized models in all metrics. CONCLUSIONS: Our results show that contextualized models produce effective representations for synonym discovery. We expect that the use of these representations in other tasks would produce similar gains in performance. |
format | Online Article Text |
id | pubmed-6994012 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69940122020-02-05 Learning unsupervised contextual representations for medical synonym discovery Schumacher, Elliot Dredze, Mark JAMIA Open Research and Applications OBJECTIVES: An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (“dilated RA” and “dilated RV”) or dissimilar (“cerebrovascular accident” and “stroke”); contextual information can determine if 2 strings are synonymous. Medical professionals utilize extensive variation of medical terminology, often not evidenced in structured medical resources. Therefore, the ability to discover synonyms, especially without reliance on training data, is an important component in processing training notes. The ability to discover synonyms from models trained on large amounts of unannotated data removes the need to rely on annotated pairs of similar words. Models relying solely on non-annotated data can be trained on a wider variety of texts without the cost of annotation, and thus may capture a broader variety of language. MATERIALS AND METHODS: Recent contextualized deep learning representation models, such as ELMo (Peters et al., 2019) and BERT, (Devlin et al. 2019) have shown strong improvements over previous approaches in a broad variety of tasks. We leverage these contextualized deep learning models to build representations of synonyms, which integrate the context of surrounding sentence and use character-level models to alleviate out-of-vocabulary issues. Using these models, we perform unsupervised discovery of likely synonym matches, which reduces the reliance on expensive training data. RESULTS: We use the ShARe/CLEF eHealth Evaluation Lab 2013 Task 1b data to evaluate our synonym discovery method. Comparing our proposed contextualized deep learning representations to previous non-neural representations, we find that the contextualized representations show consistent improvement over non-contextualized models in all metrics. CONCLUSIONS: Our results show that contextualized models produce effective representations for synonym discovery. We expect that the use of these representations in other tasks would produce similar gains in performance. Oxford University Press 2019-11-04 /pmc/articles/PMC6994012/ /pubmed/32025651 http://dx.doi.org/10.1093/jamiaopen/ooz057 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research and Applications Schumacher, Elliot Dredze, Mark Learning unsupervised contextual representations for medical synonym discovery |
title | Learning unsupervised contextual representations for medical synonym discovery |
title_full | Learning unsupervised contextual representations for medical synonym discovery |
title_fullStr | Learning unsupervised contextual representations for medical synonym discovery |
title_full_unstemmed | Learning unsupervised contextual representations for medical synonym discovery |
title_short | Learning unsupervised contextual representations for medical synonym discovery |
title_sort | learning unsupervised contextual representations for medical synonym discovery |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994012/ https://www.ncbi.nlm.nih.gov/pubmed/32025651 http://dx.doi.org/10.1093/jamiaopen/ooz057 |
work_keys_str_mv | AT schumacherelliot learningunsupervisedcontextualrepresentationsformedicalsynonymdiscovery AT dredzemark learningunsupervisedcontextualrepresentationsformedicalsynonymdiscovery |