Cargando…

Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation

BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Zhichang, Zhu, Lin, Yu, Peilin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235813/ https://www.ncbi.nlm.nih.gov/pubmed/32364514 http://dx.doi.org/10.2196/17637

_version_	1783536041637969920
author	Zhang, Zhichang Zhu, Lin Yu, Peilin
author_facet	Zhang, Zhichang Zhu, Lin Yu, Peilin
author_sort	Zhang, Zhichang
collection	PubMed
description	BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. OBJECTIVE: This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities. METHODS: In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags. RESULTS: The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method. CONCLUSIONS: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model.
format	Online Article Text
id	pubmed-7235813
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-72358132020-06-01 Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation Zhang, Zhichang Zhu, Lin Yu, Peilin JMIR Med Inform Original Paper BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. OBJECTIVE: This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities. METHODS: In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags. RESULTS: The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method. CONCLUSIONS: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model. JMIR Publications 2020-05-04 /pmc/articles/PMC7235813/ /pubmed/32364514 http://dx.doi.org/10.2196/17637 Text en ©Zhichang Zhang, Lin Zhu, Peilin Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Zhang, Zhichang Zhu, Lin Yu, Peilin Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title	Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_full	Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_fullStr	Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_full_unstemmed	Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_short	Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
title_sort	multi-level representation learning for chinese medical entity recognition: model development and validation
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7235813/ https://www.ncbi.nlm.nih.gov/pubmed/32364514 http://dx.doi.org/10.2196/17637
work_keys_str_mv	AT zhangzhichang multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation AT zhulin multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation AT yupeilin multilevelrepresentationlearningforchinesemedicalentityrecognitionmodeldevelopmentandvalidation

Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation

Ejemplares similares