Cargando…

A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for c...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Phuoc, Dinh, Dien, Nguyen, Hien T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4942671/
https://www.ncbi.nlm.nih.gov/pubmed/27446207
http://dx.doi.org/10.1155/2016/9821608
_version_ 1782442458146144256
author Tran, Phuoc
Dinh, Dien
Nguyen, Hien T.
author_facet Tran, Phuoc
Dinh, Dien
Nguyen, Hien T.
author_sort Tran, Phuoc
collection PubMed
description Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.
format Online
Article
Text
id pubmed-4942671
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-49426712016-07-21 A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation Tran, Phuoc Dinh, Dien Nguyen, Hien T. Comput Intell Neurosci Research Article Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation. Hindawi Publishing Corporation 2016 2016-06-29 /pmc/articles/PMC4942671/ /pubmed/27446207 http://dx.doi.org/10.1155/2016/9821608 Text en Copyright © 2016 Phuoc Tran et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tran, Phuoc
Dinh, Dien
Nguyen, Hien T.
A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
title A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
title_full A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
title_fullStr A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
title_full_unstemmed A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
title_short A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
title_sort character level based and word level based approach for chinese-vietnamese machine translation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4942671/
https://www.ncbi.nlm.nih.gov/pubmed/27446207
http://dx.doi.org/10.1155/2016/9821608
work_keys_str_mv AT tranphuoc acharacterlevelbasedandwordlevelbasedapproachforchinesevietnamesemachinetranslation
AT dinhdien acharacterlevelbasedandwordlevelbasedapproachforchinesevietnamesemachinetranslation
AT nguyenhient acharacterlevelbasedandwordlevelbasedapproachforchinesevietnamesemachinetranslation
AT tranphuoc characterlevelbasedandwordlevelbasedapproachforchinesevietnamesemachinetranslation
AT dinhdien characterlevelbasedandwordlevelbasedapproachforchinesevietnamesemachinetranslation
AT nguyenhient characterlevelbasedandwordlevelbasedapproachforchinesevietnamesemachinetranslation