Cargando…
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235813/ https://www.ncbi.nlm.nih.gov/pubmed/32364514 http://dx.doi.org/10.2196/17637 |
_version_ | 1783536041637969920 |
---|---|
author | Zhang, Zhichang Zhu, Lin Yu, Peilin |
author_facet | Zhang, Zhichang Zhu, Lin Yu, Peilin |
author_sort | Zhang, Zhichang |
collection | PubMed |
description | BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. OBJECTIVE: This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities. METHODS: In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags. RESULTS: The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method. CONCLUSIONS: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model. |
format | Online Article Text |
id | pubmed-7235813 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-72358132020-06-01 Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation Zhang, Zhichang Zhu, Lin Yu, Peilin JMIR Med Inform Original Paper BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. OBJECTIVE: This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities. METHODS: In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags. RESULTS: The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method. CONCLUSIONS: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model. JMIR Publications 2020-05-04 /pmc/articles/PMC7235813/ /pubmed/32364514 http://dx.doi.org/10.2196/17637 Text en ©Zhichang Zhang, Lin Zhu, Peilin Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Zhang, Zhichang Zhu, Lin Yu, Peilin Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation |
title | Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation |
title_full | Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation |
title_fullStr | Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation |
title_full_unstemmed | Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation |
title_short | Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation |
title_sort | multi-level representation learning for chinese medical entity recognition: model development and validation |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235813/ https://www.ncbi.nlm.nih.gov/pubmed/32364514 http://dx.doi.org/10.2196/17637 |
work_keys_str_mv | AT zhangzhichang multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation AT zhulin multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation AT yupeilin multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation |