Cargando…
Enhancing unsupervised medical entity linking with multi-instance learning
BACKGROUND: A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a ne...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596894/ https://www.ncbi.nlm.nih.gov/pubmed/34789262 http://dx.doi.org/10.1186/s12911-021-01654-z |
_version_ | 1784600491816124416 |
---|---|
author | Yan, Cheng Zhang, Yuanzhe Liu, Kang Zhao, Jun Shi, Yafei Liu, Shengping |
author_facet | Yan, Cheng Zhang, Yuanzhe Liu, Kang Zhao, Jun Shi, Yafei Liu, Shengping |
author_sort | Yan, Cheng |
collection | PubMed |
description | BACKGROUND: A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a necessary part of the downstream application such as disease diagnosis and prescription of drugs. Such demand becomes more urgent in colloquial and informal situations like online medical consultation, where the medical language is more casual and vaguer. In this article, we propose an unsupervised method to link the Chinese medical symptom mentions to the ICD10 classification in a colloquial background. METHODS: We propose an unsupervised entity linking model using multi-instance learning (MIL). Our approach builds on a basic unsupervised entity linking method (named BEL), which is an embedding similarity-based EL model in this paper, and uses MIL training paradigm to boost the performance of BEL. First, we construct a dataset from an unlabeled large-scale Chinese medical consultation corpus with the help of BEL. Subsequently, we use a variety of encoders to obtain the representations of mention-context and the ICD10 entities. Then the representations are fed into a ranking network to score candidate entities. RESULTS: We evaluate the proposed model on the test dataset annotated by professional doctors. The evaluation results show that our method achieves 60.34% accuracy, exceeding the fundamental BEL by 1.72%. CONCLUSIONS: We propose an unsupervised entity linking method to the entity linking in the medical domain, using MIL training manner. We annotate a test set for evaluation. The experimental results show that our model behaves better than the fundamental model BEL, and provides an insight for future research. |
format | Online Article Text |
id | pubmed-8596894 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-85968942021-11-17 Enhancing unsupervised medical entity linking with multi-instance learning Yan, Cheng Zhang, Yuanzhe Liu, Kang Zhao, Jun Shi, Yafei Liu, Shengping BMC Med Inform Decis Mak Research BACKGROUND: A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a necessary part of the downstream application such as disease diagnosis and prescription of drugs. Such demand becomes more urgent in colloquial and informal situations like online medical consultation, where the medical language is more casual and vaguer. In this article, we propose an unsupervised method to link the Chinese medical symptom mentions to the ICD10 classification in a colloquial background. METHODS: We propose an unsupervised entity linking model using multi-instance learning (MIL). Our approach builds on a basic unsupervised entity linking method (named BEL), which is an embedding similarity-based EL model in this paper, and uses MIL training paradigm to boost the performance of BEL. First, we construct a dataset from an unlabeled large-scale Chinese medical consultation corpus with the help of BEL. Subsequently, we use a variety of encoders to obtain the representations of mention-context and the ICD10 entities. Then the representations are fed into a ranking network to score candidate entities. RESULTS: We evaluate the proposed model on the test dataset annotated by professional doctors. The evaluation results show that our method achieves 60.34% accuracy, exceeding the fundamental BEL by 1.72%. CONCLUSIONS: We propose an unsupervised entity linking method to the entity linking in the medical domain, using MIL training manner. We annotate a test set for evaluation. The experimental results show that our model behaves better than the fundamental model BEL, and provides an insight for future research. BioMed Central 2021-11-16 /pmc/articles/PMC8596894/ /pubmed/34789262 http://dx.doi.org/10.1186/s12911-021-01654-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Yan, Cheng Zhang, Yuanzhe Liu, Kang Zhao, Jun Shi, Yafei Liu, Shengping Enhancing unsupervised medical entity linking with multi-instance learning |
title | Enhancing unsupervised medical entity linking with multi-instance learning |
title_full | Enhancing unsupervised medical entity linking with multi-instance learning |
title_fullStr | Enhancing unsupervised medical entity linking with multi-instance learning |
title_full_unstemmed | Enhancing unsupervised medical entity linking with multi-instance learning |
title_short | Enhancing unsupervised medical entity linking with multi-instance learning |
title_sort | enhancing unsupervised medical entity linking with multi-instance learning |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596894/ https://www.ncbi.nlm.nih.gov/pubmed/34789262 http://dx.doi.org/10.1186/s12911-021-01654-z |
work_keys_str_mv | AT yancheng enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning AT zhangyuanzhe enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning AT liukang enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning AT zhaojun enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning AT shiyafei enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning AT liushengping enhancingunsupervisedmedicalentitylinkingwithmultiinstancelearning |