Cargando…

Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation

BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhichang, Zhu, Lin, Yu, Peilin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235813/
https://www.ncbi.nlm.nih.gov/pubmed/32364514
http://dx.doi.org/10.2196/17637
_version_ 1783536041637969920
author Zhang, Zhichang
Zhu, Lin
Yu, Peilin
author_facet Zhang, Zhichang
Zhu, Lin
Yu, Peilin
author_sort Zhang, Zhichang
collection PubMed
description BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. OBJECTIVE: This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities. METHODS: In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags. RESULTS: The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method. CONCLUSIONS: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model.
format Online
Article
Text
id pubmed-7235813
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72358132020-06-01 Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation Zhang, Zhichang Zhu, Lin Yu, Peilin JMIR Med Inform Original Paper BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. OBJECTIVE: This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities. METHODS: In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags. RESULTS: The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method. CONCLUSIONS: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model. JMIR Publications 2020-05-04 /pmc/articles/PMC7235813/ /pubmed/32364514 http://dx.doi.org/10.2196/17637 Text en ©Zhichang Zhang, Lin Zhu, Peilin Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Zhang, Zhichang
Zhu, Lin
Yu, Peilin
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_full Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_fullStr Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_full_unstemmed Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_short Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_sort multi-level representation learning for chinese medical entity recognition: model development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235813/
https://www.ncbi.nlm.nih.gov/pubmed/32364514
http://dx.doi.org/10.2196/17637
work_keys_str_mv AT zhangzhichang multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation
AT zhulin multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation
AT yupeilin multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation