Cargando…

A Mixed Semantic Features Model for Chinese NER with Characters and Words

Named Entity Recognition (NER) is an essential part of many natural language processing (NLP) tasks. The existing Chinese NER methods are mostly based on word segmentation, or use the character sequences as input. However, using a single granularity representation would suffer from the problems of o...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, Ning, Zhong, Jiang, Li, Qing, Zhu, Jiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148203/
http://dx.doi.org/10.1007/978-3-030-45439-5_24
_version_ 1783520542502944768
author Chang, Ning
Zhong, Jiang
Li, Qing
Zhu, Jiang
author_facet Chang, Ning
Zhong, Jiang
Li, Qing
Zhu, Jiang
author_sort Chang, Ning
collection PubMed
description Named Entity Recognition (NER) is an essential part of many natural language processing (NLP) tasks. The existing Chinese NER methods are mostly based on word segmentation, or use the character sequences as input. However, using a single granularity representation would suffer from the problems of out-of-vocabulary and word segmentation errors, and the semantic content is relatively simple. In this paper, we introduce the self-attention mechanism into the BiLSTM-CRF neural network structure for Chinese named entity recognition with two embedding. Different from other models, our method combines character and word features at the sequence level, and the attention mechanism computes similarity on the total sequence consisted of characters and words. The character semantic information and the structure of words work together to improve the accuracy of word boundary segmentation and solve the problem of long-phrase combination. We validate our model on MSRA and Weibo corpora, and experiments demonstrate that our model can significantly improve the performance of the Chinese NER task.
format Online
Article
Text
id pubmed-7148203
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71482032020-04-13 A Mixed Semantic Features Model for Chinese NER with Characters and Words Chang, Ning Zhong, Jiang Li, Qing Zhu, Jiang Advances in Information Retrieval Article Named Entity Recognition (NER) is an essential part of many natural language processing (NLP) tasks. The existing Chinese NER methods are mostly based on word segmentation, or use the character sequences as input. However, using a single granularity representation would suffer from the problems of out-of-vocabulary and word segmentation errors, and the semantic content is relatively simple. In this paper, we introduce the self-attention mechanism into the BiLSTM-CRF neural network structure for Chinese named entity recognition with two embedding. Different from other models, our method combines character and word features at the sequence level, and the attention mechanism computes similarity on the total sequence consisted of characters and words. The character semantic information and the structure of words work together to improve the accuracy of word boundary segmentation and solve the problem of long-phrase combination. We validate our model on MSRA and Weibo corpora, and experiments demonstrate that our model can significantly improve the performance of the Chinese NER task. 2020-03-17 /pmc/articles/PMC7148203/ http://dx.doi.org/10.1007/978-3-030-45439-5_24 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Chang, Ning
Zhong, Jiang
Li, Qing
Zhu, Jiang
A Mixed Semantic Features Model for Chinese NER with Characters and Words
title A Mixed Semantic Features Model for Chinese NER with Characters and Words
title_full A Mixed Semantic Features Model for Chinese NER with Characters and Words
title_fullStr A Mixed Semantic Features Model for Chinese NER with Characters and Words
title_full_unstemmed A Mixed Semantic Features Model for Chinese NER with Characters and Words
title_short A Mixed Semantic Features Model for Chinese NER with Characters and Words
title_sort mixed semantic features model for chinese ner with characters and words
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148203/
http://dx.doi.org/10.1007/978-3-030-45439-5_24
work_keys_str_mv AT changning amixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT zhongjiang amixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT liqing amixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT zhujiang amixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT changning mixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT zhongjiang mixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT liqing mixedsemanticfeaturesmodelforchinesenerwithcharactersandwords
AT zhujiang mixedsemanticfeaturesmodelforchinesenerwithcharactersandwords