Cargando…
Cross-lingual Unified Medical Language System entity linking in online health communities
OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566404/ https://www.ncbi.nlm.nih.gov/pubmed/32910823 http://dx.doi.org/10.1093/jamia/ocaa150 |
_version_ | 1783596128160186368 |
---|---|
author | Bitton, Yonatan Cohen, Raphael Schifter, Tamar Bachmat, Eitan Elhadad, Michael Elhadad, Noémie |
author_facet | Bitton, Yonatan Cohen, Raphael Schifter, Tamar Bachmat, Eitan Elhadad, Michael Elhadad, Noémie |
author_sort | Bitton, Yonatan |
collection | PubMed |
description | OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. MATERIALS AND METHODS: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. RESULTS: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. CONCLUSIONS: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel). |
format | Online Article Text |
id | pubmed-7566404 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-75664042020-10-20 Cross-lingual Unified Medical Language System entity linking in online health communities Bitton, Yonatan Cohen, Raphael Schifter, Tamar Bachmat, Eitan Elhadad, Michael Elhadad, Noémie J Am Med Inform Assoc Research and Applications OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. MATERIALS AND METHODS: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. RESULTS: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. CONCLUSIONS: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel). Oxford University Press 2020-09-10 /pmc/articles/PMC7566404/ /pubmed/32910823 http://dx.doi.org/10.1093/jamia/ocaa150 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Bitton, Yonatan Cohen, Raphael Schifter, Tamar Bachmat, Eitan Elhadad, Michael Elhadad, Noémie Cross-lingual Unified Medical Language System entity linking in online health communities |
title | Cross-lingual Unified Medical Language System entity linking in online health communities |
title_full | Cross-lingual Unified Medical Language System entity linking in online health communities |
title_fullStr | Cross-lingual Unified Medical Language System entity linking in online health communities |
title_full_unstemmed | Cross-lingual Unified Medical Language System entity linking in online health communities |
title_short | Cross-lingual Unified Medical Language System entity linking in online health communities |
title_sort | cross-lingual unified medical language system entity linking in online health communities |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566404/ https://www.ncbi.nlm.nih.gov/pubmed/32910823 http://dx.doi.org/10.1093/jamia/ocaa150 |
work_keys_str_mv | AT bittonyonatan crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities AT cohenraphael crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities AT schiftertamar crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities AT bachmateitan crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities AT elhadadmichael crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities AT elhadadnoemie crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities |