Cargando…

Cross-lingual Unified Medical Language System entity linking in online health communities

OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be...

Descripción completa

Detalles Bibliográficos
Autores principales: Bitton, Yonatan, Cohen, Raphael, Schifter, Tamar, Bachmat, Eitan, Elhadad, Michael, Elhadad, Noémie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566404/
https://www.ncbi.nlm.nih.gov/pubmed/32910823
http://dx.doi.org/10.1093/jamia/ocaa150
_version_ 1783596128160186368
author Bitton, Yonatan
Cohen, Raphael
Schifter, Tamar
Bachmat, Eitan
Elhadad, Michael
Elhadad, Noémie
author_facet Bitton, Yonatan
Cohen, Raphael
Schifter, Tamar
Bachmat, Eitan
Elhadad, Michael
Elhadad, Noémie
author_sort Bitton, Yonatan
collection PubMed
description OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. MATERIALS AND METHODS: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. RESULTS: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. CONCLUSIONS: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
format Online
Article
Text
id pubmed-7566404
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-75664042020-10-20 Cross-lingual Unified Medical Language System entity linking in online health communities Bitton, Yonatan Cohen, Raphael Schifter, Tamar Bachmat, Eitan Elhadad, Michael Elhadad, Noémie J Am Med Inform Assoc Research and Applications OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. MATERIALS AND METHODS: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. RESULTS: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. CONCLUSIONS: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel). Oxford University Press 2020-09-10 /pmc/articles/PMC7566404/ /pubmed/32910823 http://dx.doi.org/10.1093/jamia/ocaa150 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Bitton, Yonatan
Cohen, Raphael
Schifter, Tamar
Bachmat, Eitan
Elhadad, Michael
Elhadad, Noémie
Cross-lingual Unified Medical Language System entity linking in online health communities
title Cross-lingual Unified Medical Language System entity linking in online health communities
title_full Cross-lingual Unified Medical Language System entity linking in online health communities
title_fullStr Cross-lingual Unified Medical Language System entity linking in online health communities
title_full_unstemmed Cross-lingual Unified Medical Language System entity linking in online health communities
title_short Cross-lingual Unified Medical Language System entity linking in online health communities
title_sort cross-lingual unified medical language system entity linking in online health communities
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566404/
https://www.ncbi.nlm.nih.gov/pubmed/32910823
http://dx.doi.org/10.1093/jamia/ocaa150
work_keys_str_mv AT bittonyonatan crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities
AT cohenraphael crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities
AT schiftertamar crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities
AT bachmateitan crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities
AT elhadadmichael crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities
AT elhadadnoemie crosslingualunifiedmedicallanguagesystementitylinkinginonlinehealthcommunities