Cargando…

A multi-layer soft lattice based model for Chinese clinical named entity recognition

OBJECTIVE: Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Shuli, Yang, Wentao, Han, Lina, Song, Xiaowei, Wang, Guowei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338545/
https://www.ncbi.nlm.nih.gov/pubmed/35908055
http://dx.doi.org/10.1186/s12911-022-01924-4
_version_ 1784759992411226112
author Guo, Shuli
Yang, Wentao
Han, Lina
Song, Xiaowei
Wang, Guowei
author_facet Guo, Shuli
Yang, Wentao
Han, Lina
Song, Xiaowei
Wang, Guowei
author_sort Guo, Shuli
collection PubMed
description OBJECTIVE: Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mostly apply long short-term memory (LSTM) and have surprising performance in clinical NER. However, increasing the depth of the network is often required by these LSTM-based models to capture long-distance dependencies. Therefore, these LSTM-based models that have achieved high accuracy generally require long training times and extensive training data, which has obstructed the adoption of LSTM-based models in clinical scenarios with limited training time. METHOD: Inspired by Transformer, we combine Transformer with Soft Term Position Lattice to form soft lattice structure Transformer, which models long-distance dependencies similarly to LSTM. Our model consists of four components: the WordPiece module, the BERT module, the soft lattice structure Transformer module, and the CRF module. RESULT: Our experiments demonstrated that this approach increased the F1 by 1–5% in the CCKS NER task compared to other models based on LSTM with CRF and consumed less training time. Additional evaluations showed that lattice structure transformer shows good performance for recognizing long medical terms, abbreviations, and numbers. The proposed model achieve 91.6% f-measure in recognizing long medical terms and 90.36% f-measure in abbreviations, and numbers. CONCLUSIONS: By using soft lattice structure Transformer, the method proposed in this paper captured Chinese words to lattice information, making our model suitable for Chinese clinical medical records. Transformers with Mutilayer soft lattice Chinese word construction can capture potential interactions between Chinese characters and words.
format Online
Article
Text
id pubmed-9338545
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93385452022-07-31 A multi-layer soft lattice based model for Chinese clinical named entity recognition Guo, Shuli Yang, Wentao Han, Lina Song, Xiaowei Wang, Guowei BMC Med Inform Decis Mak Research OBJECTIVE: Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mostly apply long short-term memory (LSTM) and have surprising performance in clinical NER. However, increasing the depth of the network is often required by these LSTM-based models to capture long-distance dependencies. Therefore, these LSTM-based models that have achieved high accuracy generally require long training times and extensive training data, which has obstructed the adoption of LSTM-based models in clinical scenarios with limited training time. METHOD: Inspired by Transformer, we combine Transformer with Soft Term Position Lattice to form soft lattice structure Transformer, which models long-distance dependencies similarly to LSTM. Our model consists of four components: the WordPiece module, the BERT module, the soft lattice structure Transformer module, and the CRF module. RESULT: Our experiments demonstrated that this approach increased the F1 by 1–5% in the CCKS NER task compared to other models based on LSTM with CRF and consumed less training time. Additional evaluations showed that lattice structure transformer shows good performance for recognizing long medical terms, abbreviations, and numbers. The proposed model achieve 91.6% f-measure in recognizing long medical terms and 90.36% f-measure in abbreviations, and numbers. CONCLUSIONS: By using soft lattice structure Transformer, the method proposed in this paper captured Chinese words to lattice information, making our model suitable for Chinese clinical medical records. Transformers with Mutilayer soft lattice Chinese word construction can capture potential interactions between Chinese characters and words. BioMed Central 2022-07-30 /pmc/articles/PMC9338545/ /pubmed/35908055 http://dx.doi.org/10.1186/s12911-022-01924-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Guo, Shuli
Yang, Wentao
Han, Lina
Song, Xiaowei
Wang, Guowei
A multi-layer soft lattice based model for Chinese clinical named entity recognition
title A multi-layer soft lattice based model for Chinese clinical named entity recognition
title_full A multi-layer soft lattice based model for Chinese clinical named entity recognition
title_fullStr A multi-layer soft lattice based model for Chinese clinical named entity recognition
title_full_unstemmed A multi-layer soft lattice based model for Chinese clinical named entity recognition
title_short A multi-layer soft lattice based model for Chinese clinical named entity recognition
title_sort multi-layer soft lattice based model for chinese clinical named entity recognition
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338545/
https://www.ncbi.nlm.nih.gov/pubmed/35908055
http://dx.doi.org/10.1186/s12911-022-01924-4
work_keys_str_mv AT guoshuli amultilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT yangwentao amultilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT hanlina amultilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT songxiaowei amultilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT wangguowei amultilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT guoshuli multilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT yangwentao multilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT hanlina multilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT songxiaowei multilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition
AT wangguowei multilayersoftlatticebasedmodelforchineseclinicalnamedentityrecognition