Cargando…

Mixed-Level Neural Machine Translation

Building the first Russian-Vietnamese neural machine translation system, we faced the problem of choosing a translation unit system on which source and target embeddings are based. Available homogeneous translation unit systems with the same translation unit on the source and target sides do not per...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Thien, Nguyen, Huu, Tran, Phuoc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7722455/
https://www.ncbi.nlm.nih.gov/pubmed/33335545
http://dx.doi.org/10.1155/2020/8859452
_version_ 1783620157474603008
author Nguyen, Thien
Nguyen, Huu
Tran, Phuoc
author_facet Nguyen, Thien
Nguyen, Huu
Tran, Phuoc
author_sort Nguyen, Thien
collection PubMed
description Building the first Russian-Vietnamese neural machine translation system, we faced the problem of choosing a translation unit system on which source and target embeddings are based. Available homogeneous translation unit systems with the same translation unit on the source and target sides do not perfectly suit the investigated language pair. To solve the problem, in this paper, we propose a novel heterogeneous translation unit system, considering linguistic characteristics of the synthetic Russian language and the analytic Vietnamese language. Specifically, we decrease the embedding level on the source side by splitting token into subtokens and increase the embedding level on the target side by merging neighboring tokens into supertoken. The experiment results show that the proposed heterogeneous system improves over the existing best homogeneous Russian-Vietnamese translation system by 1.17 BLEU. Our approach could be applied to building translation bots for language pairs with different linguistic characteristics.
format Online
Article
Text
id pubmed-7722455
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-77224552020-12-16 Mixed-Level Neural Machine Translation Nguyen, Thien Nguyen, Huu Tran, Phuoc Comput Intell Neurosci Research Article Building the first Russian-Vietnamese neural machine translation system, we faced the problem of choosing a translation unit system on which source and target embeddings are based. Available homogeneous translation unit systems with the same translation unit on the source and target sides do not perfectly suit the investigated language pair. To solve the problem, in this paper, we propose a novel heterogeneous translation unit system, considering linguistic characteristics of the synthetic Russian language and the analytic Vietnamese language. Specifically, we decrease the embedding level on the source side by splitting token into subtokens and increase the embedding level on the target side by merging neighboring tokens into supertoken. The experiment results show that the proposed heterogeneous system improves over the existing best homogeneous Russian-Vietnamese translation system by 1.17 BLEU. Our approach could be applied to building translation bots for language pairs with different linguistic characteristics. Hindawi 2020-11-29 /pmc/articles/PMC7722455/ /pubmed/33335545 http://dx.doi.org/10.1155/2020/8859452 Text en Copyright © 2020 Thien Nguyen et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nguyen, Thien
Nguyen, Huu
Tran, Phuoc
Mixed-Level Neural Machine Translation
title Mixed-Level Neural Machine Translation
title_full Mixed-Level Neural Machine Translation
title_fullStr Mixed-Level Neural Machine Translation
title_full_unstemmed Mixed-Level Neural Machine Translation
title_short Mixed-Level Neural Machine Translation
title_sort mixed-level neural machine translation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7722455/
https://www.ncbi.nlm.nih.gov/pubmed/33335545
http://dx.doi.org/10.1155/2020/8859452
work_keys_str_mv AT nguyenthien mixedlevelneuralmachinetranslation
AT nguyenhuu mixedlevelneuralmachinetranslation
AT tranphuoc mixedlevelneuralmachinetranslation