Cargando…

Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations

BACKGROUND: Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical dec...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yongbin, Wang, Xiaohua, Hui, Linhu, Zou, Liping, Li, Hongjin, Xu, Luo, Liu, Weihai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7501578/
https://www.ncbi.nlm.nih.gov/pubmed/32885786
http://dx.doi.org/10.2196/19848
_version_ 1783584057187106816
author Li, Yongbin
Wang, Xiaohua
Hui, Linhu
Zou, Liping
Li, Hongjin
Xu, Luo
Liu, Weihai
author_facet Li, Yongbin
Wang, Xiaohua
Hui, Linhu
Zou, Liping
Li, Hongjin
Xu, Luo
Liu, Weihai
author_sort Li, Yongbin
collection PubMed
description BACKGROUND: Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. OBJECTIVE: With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. METHODS: In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. RESULTS: We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. CONCLUSIONS: Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.
format Online
Article
Text
id pubmed-7501578
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-75015782020-09-30 Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations Li, Yongbin Wang, Xiaohua Hui, Linhu Zou, Liping Li, Hongjin Xu, Luo Liu, Weihai JMIR Med Inform Original Paper BACKGROUND: Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. OBJECTIVE: With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. METHODS: In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. RESULTS: We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. CONCLUSIONS: Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model. JMIR Publications 2020-09-04 /pmc/articles/PMC7501578/ /pubmed/32885786 http://dx.doi.org/10.2196/19848 Text en ©Yongbin Li, Xiaohua Wang, Linhu Hui, Liping Zou, Hongjin Li, Luo Xu, Weihai Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.09.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Li, Yongbin
Wang, Xiaohua
Hui, Linhu
Zou, Liping
Li, Hongjin
Xu, Luo
Liu, Weihai
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
title Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
title_full Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
title_fullStr Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
title_full_unstemmed Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
title_short Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations
title_sort chinese clinical named entity recognition in electronic medical records: development of a lattice long short-term memory model with contextualized character representations
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7501578/
https://www.ncbi.nlm.nih.gov/pubmed/32885786
http://dx.doi.org/10.2196/19848
work_keys_str_mv AT liyongbin chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations
AT wangxiaohua chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations
AT huilinhu chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations
AT zouliping chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations
AT lihongjin chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations
AT xuluo chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations
AT liuweihai chineseclinicalnamedentityrecognitioninelectronicmedicalrecordsdevelopmentofalatticelongshorttermmemorymodelwithcontextualizedcharacterrepresentations