Cargando…

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules

Electronic medical records are an integral part of medical texts. Entity recognition of electronic medical records has triggered many studies that propose many entity extraction methods. In this paper, an entity extraction model is proposed to extract entities from Chinese Electronic Medical Records...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Xianglong, Ouyang, Chunping, Liu, Yongbin, Bu, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7215438/
https://www.ncbi.nlm.nih.gov/pubmed/32295174
http://dx.doi.org/10.3390/ijerph17082687
_version_ 1783532187333689344
author Chen, Xianglong
Ouyang, Chunping
Liu, Yongbin
Bu, Yi
author_facet Chen, Xianglong
Ouyang, Chunping
Liu, Yongbin
Bu, Yi
author_sort Chen, Xianglong
collection PubMed
description Electronic medical records are an integral part of medical texts. Entity recognition of electronic medical records has triggered many studies that propose many entity extraction methods. In this paper, an entity extraction model is proposed to extract entities from Chinese Electronic Medical Records (CEMR). In the input layer of the model, we use word embedding and dictionary features embedding as input vectors, where word embedding consists of a character representation and a word representation. Then, the input vectors are fed to the bidirectional long short-term memory to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. We performed experiments on body classification task, and the F1 values reached 90.65%. We also performed experiments on anatomic region recognition task, and the F1 values reached 93.89%. On both tasks, our model had higher performance than state-of-the-art models, such as Bi-LSTM-CRF, Bi-LSTM-Attention, and Vote. Through experiments, our model has a good effect when dealing with small frequency entities and unknown entities; with a small training dataset, our method showed 2–4% improvement on F1 value compared to the basic Bi-LSTM-CRF models. Additionally, on anatomic region recognition task, besides using our proposed entity extraction model, 12 rules we designed and domain dictionary were adopted. Then, in this task, the weighted F1 value of the three specific entities extraction reached 84.36%.
format Online
Article
Text
id pubmed-7215438
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72154382020-05-18 Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules Chen, Xianglong Ouyang, Chunping Liu, Yongbin Bu, Yi Int J Environ Res Public Health Article Electronic medical records are an integral part of medical texts. Entity recognition of electronic medical records has triggered many studies that propose many entity extraction methods. In this paper, an entity extraction model is proposed to extract entities from Chinese Electronic Medical Records (CEMR). In the input layer of the model, we use word embedding and dictionary features embedding as input vectors, where word embedding consists of a character representation and a word representation. Then, the input vectors are fed to the bidirectional long short-term memory to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. We performed experiments on body classification task, and the F1 values reached 90.65%. We also performed experiments on anatomic region recognition task, and the F1 values reached 93.89%. On both tasks, our model had higher performance than state-of-the-art models, such as Bi-LSTM-CRF, Bi-LSTM-Attention, and Vote. Through experiments, our model has a good effect when dealing with small frequency entities and unknown entities; with a small training dataset, our method showed 2–4% improvement on F1 value compared to the basic Bi-LSTM-CRF models. Additionally, on anatomic region recognition task, besides using our proposed entity extraction model, 12 rules we designed and domain dictionary were adopted. Then, in this task, the weighted F1 value of the three specific entities extraction reached 84.36%. MDPI 2020-04-14 2020-04 /pmc/articles/PMC7215438/ /pubmed/32295174 http://dx.doi.org/10.3390/ijerph17082687 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Xianglong
Ouyang, Chunping
Liu, Yongbin
Bu, Yi
Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
title Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
title_full Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
title_fullStr Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
title_full_unstemmed Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
title_short Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules
title_sort improving the named entity recognition of chinese electronic medical records by combining domain dictionary and rules
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7215438/
https://www.ncbi.nlm.nih.gov/pubmed/32295174
http://dx.doi.org/10.3390/ijerph17082687
work_keys_str_mv AT chenxianglong improvingthenamedentityrecognitionofchineseelectronicmedicalrecordsbycombiningdomaindictionaryandrules
AT ouyangchunping improvingthenamedentityrecognitionofchineseelectronicmedicalrecordsbycombiningdomaindictionaryandrules
AT liuyongbin improvingthenamedentityrecognitionofchineseelectronicmedicalrecordsbycombiningdomaindictionaryandrules
AT buyi improvingthenamedentityrecognitionofchineseelectronicmedicalrecordsbycombiningdomaindictionaryandrules