Cargando…

Chinese Unknown Word Recognition for PCFG-LA Parsing

This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce th...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Qiuping, He, Liangye, Wong, Derek F., Chao, Lidia S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4032743/
https://www.ncbi.nlm.nih.gov/pubmed/24895681
http://dx.doi.org/10.1155/2014/959328
_version_ 1782317693731340288
author Huang, Qiuping
He, Liangye
Wong, Derek F.
Chao, Lidia S.
author_facet Huang, Qiuping
He, Liangye
Wong, Derek F.
Chao, Lidia S.
author_sort Huang, Qiuping
collection PubMed
description This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness.
format Online
Article
Text
id pubmed-4032743
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-40327432014-06-03 Chinese Unknown Word Recognition for PCFG-LA Parsing Huang, Qiuping He, Liangye Wong, Derek F. Chao, Lidia S. ScientificWorldJournal Research Article This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness. Hindawi Publishing Corporation 2014 2014-04-09 /pmc/articles/PMC4032743/ /pubmed/24895681 http://dx.doi.org/10.1155/2014/959328 Text en Copyright © 2014 Qiuping Huang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Qiuping
He, Liangye
Wong, Derek F.
Chao, Lidia S.
Chinese Unknown Word Recognition for PCFG-LA Parsing
title Chinese Unknown Word Recognition for PCFG-LA Parsing
title_full Chinese Unknown Word Recognition for PCFG-LA Parsing
title_fullStr Chinese Unknown Word Recognition for PCFG-LA Parsing
title_full_unstemmed Chinese Unknown Word Recognition for PCFG-LA Parsing
title_short Chinese Unknown Word Recognition for PCFG-LA Parsing
title_sort chinese unknown word recognition for pcfg-la parsing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4032743/
https://www.ncbi.nlm.nih.gov/pubmed/24895681
http://dx.doi.org/10.1155/2014/959328
work_keys_str_mv AT huangqiuping chineseunknownwordrecognitionforpcfglaparsing
AT heliangye chineseunknownwordrecognitionforpcfglaparsing
AT wongderekf chineseunknownwordrecognitionforpcfglaparsing
AT chaolidias chineseunknownwordrecognitionforpcfglaparsing