Cargando…

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records

BACKGROUND: The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors’ personal style may result in multiple expressions of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cai, Xiaoling, Dong, Shoubin, Hu, Jinlong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454585/ https://www.ncbi.nlm.nih.gov/pubmed/30961622 http://dx.doi.org/10.1186/s12911-019-0762-7

_version_	1783409565051650048
author	Cai, Xiaoling Dong, Shoubin Hu, Jinlong
author_facet	Cai, Xiaoling Dong, Shoubin Hu, Jinlong
author_sort	Cai, Xiaoling
collection	PubMed
description	BACKGROUND: The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors’ personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary. METHODS: In this study, we incorporate part-of-speech (POS) information into the deep learning model to improve the accuracy of Chinese entity boundary detection. In order to avoid the wrongly POS tagging of long entities, we proposed a method called reduced POS tagging that reserves the tags of general words but not of the seemingly medical entities. The model proposed in this paper, named SM-LSTM-CRF, consists of three layers: self-matching attention layer – calculating the relevance of each character to the entire sentence; LSTM (Long Short-Term Memory) layer – capturing the context feature of each character; CRF (Conditional Random Field) layer – labeling characters based on their features and transfer rules. RESULTS: The experimental results at a Chinese EMRs dataset show that the F1 value of SM-LSTM-CRF is increased by 2.59% compared to that of the LSTM-CRF. After adding POS feature in the model, we get an improvement of about 7.74% at F1. The reduced POS tagging reduces the false tagging on long entities, thus increases the F1 value by 2.42% and achieves an F1 score of 80.07%. CONCLUSIONS: The POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and has a good performance in the recognition of clinical entities.
format	Online Article Text
id	pubmed-6454585
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-64545852019-04-17 A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records Cai, Xiaoling Dong, Shoubin Hu, Jinlong BMC Med Inform Decis Mak Research BACKGROUND: The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors’ personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary. METHODS: In this study, we incorporate part-of-speech (POS) information into the deep learning model to improve the accuracy of Chinese entity boundary detection. In order to avoid the wrongly POS tagging of long entities, we proposed a method called reduced POS tagging that reserves the tags of general words but not of the seemingly medical entities. The model proposed in this paper, named SM-LSTM-CRF, consists of three layers: self-matching attention layer – calculating the relevance of each character to the entire sentence; LSTM (Long Short-Term Memory) layer – capturing the context feature of each character; CRF (Conditional Random Field) layer – labeling characters based on their features and transfer rules. RESULTS: The experimental results at a Chinese EMRs dataset show that the F1 value of SM-LSTM-CRF is increased by 2.59% compared to that of the LSTM-CRF. After adding POS feature in the model, we get an improvement of about 7.74% at F1. The reduced POS tagging reduces the false tagging on long entities, thus increases the F1 value by 2.42% and achieves an F1 score of 80.07%. CONCLUSIONS: The POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and has a good performance in the recognition of clinical entities. BioMed Central 2019-04-09 /pmc/articles/PMC6454585/ /pubmed/30961622 http://dx.doi.org/10.1186/s12911-019-0762-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Cai, Xiaoling Dong, Shoubin Hu, Jinlong A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
title	A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
title_full	A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
title_fullStr	A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
title_full_unstemmed	A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
title_short	A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
title_sort	deep learning model incorporating part of speech and self-matching attention for named entity recognition of chinese electronic medical records
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454585/ https://www.ncbi.nlm.nih.gov/pubmed/30961622 http://dx.doi.org/10.1186/s12911-019-0762-7
work_keys_str_mv	AT caixiaoling adeeplearningmodelincorporatingpartofspeechandselfmatchingattentionfornamedentityrecognitionofchineseelectronicmedicalrecords AT dongshoubin adeeplearningmodelincorporatingpartofspeechandselfmatchingattentionfornamedentityrecognitionofchineseelectronicmedicalrecords AT hujinlong adeeplearningmodelincorporatingpartofspeechandselfmatchingattentionfornamedentityrecognitionofchineseelectronicmedicalrecords AT caixiaoling deeplearningmodelincorporatingpartofspeechandselfmatchingattentionfornamedentityrecognitionofchineseelectronicmedicalrecords AT dongshoubin deeplearningmodelincorporatingpartofspeechandselfmatchingattentionfornamedentityrecognitionofchineseelectronicmedicalrecords AT hujinlong deeplearningmodelincorporatingpartofspeechandselfmatchingattentionfornamedentityrecognitionofchineseelectronicmedicalrecords

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records

Ejemplares similares