Cargando…

Enhancing unsupervised medical entity linking with multi-instance learning

BACKGROUND: A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a ne...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Cheng, Zhang, Yuanzhe, Liu, Kang, Zhao, Jun, Shi, Yafei, Liu, Shengping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596894/
https://www.ncbi.nlm.nih.gov/pubmed/34789262
http://dx.doi.org/10.1186/s12911-021-01654-z
_version_ 1784600491816124416
author Yan, Cheng
Zhang, Yuanzhe
Liu, Kang
Zhao, Jun
Shi, Yafei
Liu, Shengping
author_facet Yan, Cheng
Zhang, Yuanzhe
Liu, Kang
Zhao, Jun
Shi, Yafei
Liu, Shengping
author_sort Yan, Cheng
collection PubMed
description BACKGROUND: A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a necessary part of the downstream application such as disease diagnosis and prescription of drugs. Such demand becomes more urgent in colloquial and informal situations like online medical consultation, where the medical language is more casual and vaguer. In this article, we propose an unsupervised method to link the Chinese medical symptom mentions to the ICD10 classification in a colloquial background. METHODS: We propose an unsupervised entity linking model using multi-instance learning (MIL). Our approach builds on a basic unsupervised entity linking method (named BEL), which is an embedding similarity-based EL model in this paper, and uses MIL training paradigm to boost the performance of BEL. First, we construct a dataset from an unlabeled large-scale Chinese medical consultation corpus with the help of BEL. Subsequently, we use a variety of encoders to obtain the representations of mention-context and the ICD10 entities. Then the representations are fed into a ranking network to score candidate entities. RESULTS: We evaluate the proposed model on the test dataset annotated by professional doctors. The evaluation results show that our method achieves 60.34% accuracy, exceeding the fundamental BEL by 1.72%. CONCLUSIONS: We propose an unsupervised entity linking method to the entity linking in the medical domain, using MIL training manner. We annotate a test set for evaluation. The experimental results show that our model behaves better than the fundamental model BEL, and provides an insight for future research.
format Online
Article
Text
id pubmed-8596894
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85968942021-11-17 Enhancing unsupervised medical entity linking with multi-instance learning Yan, Cheng Zhang, Yuanzhe Liu, Kang Zhao, Jun Shi, Yafei Liu, Shengping BMC Med Inform Decis Mak Research BACKGROUND: A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a necessary part of the downstream application such as disease diagnosis and prescription of drugs. Such demand becomes more urgent in colloquial and informal situations like online medical consultation, where the medical language is more casual and vaguer. In this article, we propose an unsupervised method to link the Chinese medical symptom mentions to the ICD10 classification in a colloquial background. METHODS: We propose an unsupervised entity linking model using multi-instance learning (MIL). Our approach builds on a basic unsupervised entity linking method (named BEL), which is an embedding similarity-based EL model in this paper, and uses MIL training paradigm to boost the performance of BEL. First, we construct a dataset from an unlabeled large-scale Chinese medical consultation corpus with the help of BEL. Subsequently, we use a variety of encoders to obtain the representations of mention-context and the ICD10 entities. Then the representations are fed into a ranking network to score candidate entities. RESULTS: We evaluate the proposed model on the test dataset annotated by professional doctors. The evaluation results show that our method achieves 60.34% accuracy, exceeding the fundamental BEL by 1.72%. CONCLUSIONS: We propose an unsupervised entity linking method to the entity linking in the medical domain, using MIL training manner. We annotate a test set for evaluation. The experimental results show that our model behaves better than the fundamental model BEL, and provides an insight for future research. BioMed Central 2021-11-16 /pmc/articles/PMC8596894/ /pubmed/34789262 http://dx.doi.org/10.1186/s12911-021-01654-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yan, Cheng
Zhang, Yuanzhe
Liu, Kang
Zhao, Jun
Shi, Yafei
Liu, Shengping
Enhancing unsupervised medical entity linking with multi-instance learning
title Enhancing unsupervised medical entity linking with multi-instance learning
title_full Enhancing unsupervised medical entity linking with multi-instance learning
title_fullStr Enhancing unsupervised medical entity linking with multi-instance learning
title_full_unstemmed Enhancing unsupervised medical entity linking with multi-instance learning
title_short Enhancing unsupervised medical entity linking with multi-instance learning
title_sort enhancing unsupervised medical entity linking with multi-instance learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596894/
https://www.ncbi.nlm.nih.gov/pubmed/34789262
http://dx.doi.org/10.1186/s12911-021-01654-z
work_keys_str_mv AT yancheng enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning
AT zhangyuanzhe enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning
AT liukang enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning
AT zhaojun enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning
AT shiyafei enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning
AT liushengping enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning