Cargando…

Heavyweight Statistical Alignment to Guide Neural Translation

Transformer neural models with multihead attentions outperform all existing translation models. Nevertheless, some features of traditional statistical models, such as prior alignment between source and target words, prove useful in training the state-of-the-art Transformer models. It has been report...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Thien, Nguyen, Trang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9187440/
https://www.ncbi.nlm.nih.gov/pubmed/35694597
http://dx.doi.org/10.1155/2022/6856567
_version_ 1784725169925783552
author Nguyen, Thien
Nguyen, Trang
author_facet Nguyen, Thien
Nguyen, Trang
author_sort Nguyen, Thien
collection PubMed
description Transformer neural models with multihead attentions outperform all existing translation models. Nevertheless, some features of traditional statistical models, such as prior alignment between source and target words, prove useful in training the state-of-the-art Transformer models. It has been reported that lightweight prior alignment can effectively guide a head in the multihead cross-attention sublayer responsible for the alignment of Transformer models. In this work, we make a step further by applying heavyweight prior alignments to guide all heads. Specifically, we use the weight of 0.5 for the alignment cost added to the token cost in formulating the overall cost of training a Transformer model, where the alignment cost is defined as the deviation of the attention probability from the prior alignments. Moreover, we increase the role of prior alignment, computing the attention probability by averaging all heads of the multihead attention sublayer within the penultimate layer of Transformer model. Experimental results on an English-Vietnamese translation task show that our proposed approach helps train superior Transformer-based translation models. Our Transformer model (25.71) outperforms the baseline model (21.34) by the large 4.37 BLEU. Case studies by native speakers on some translation results validate the machine judgment. The results so far encourage the use of heavyweight prior alignments to improve Transformer-based translation models. This work contributes to the literature on the machine translation, especially, for unpopular language pairs. Since the proposal in this work is language-independent, it can be applied to different language pairs, including Slavic languages.
format Online
Article
Text
id pubmed-9187440
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-91874402022-06-11 Heavyweight Statistical Alignment to Guide Neural Translation Nguyen, Thien Nguyen, Trang Comput Intell Neurosci Research Article Transformer neural models with multihead attentions outperform all existing translation models. Nevertheless, some features of traditional statistical models, such as prior alignment between source and target words, prove useful in training the state-of-the-art Transformer models. It has been reported that lightweight prior alignment can effectively guide a head in the multihead cross-attention sublayer responsible for the alignment of Transformer models. In this work, we make a step further by applying heavyweight prior alignments to guide all heads. Specifically, we use the weight of 0.5 for the alignment cost added to the token cost in formulating the overall cost of training a Transformer model, where the alignment cost is defined as the deviation of the attention probability from the prior alignments. Moreover, we increase the role of prior alignment, computing the attention probability by averaging all heads of the multihead attention sublayer within the penultimate layer of Transformer model. Experimental results on an English-Vietnamese translation task show that our proposed approach helps train superior Transformer-based translation models. Our Transformer model (25.71) outperforms the baseline model (21.34) by the large 4.37 BLEU. Case studies by native speakers on some translation results validate the machine judgment. The results so far encourage the use of heavyweight prior alignments to improve Transformer-based translation models. This work contributes to the literature on the machine translation, especially, for unpopular language pairs. Since the proposal in this work is language-independent, it can be applied to different language pairs, including Slavic languages. Hindawi 2022-06-03 /pmc/articles/PMC9187440/ /pubmed/35694597 http://dx.doi.org/10.1155/2022/6856567 Text en Copyright © 2022 Thien Nguyen and Trang Nguyen. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nguyen, Thien
Nguyen, Trang
Heavyweight Statistical Alignment to Guide Neural Translation
title Heavyweight Statistical Alignment to Guide Neural Translation
title_full Heavyweight Statistical Alignment to Guide Neural Translation
title_fullStr Heavyweight Statistical Alignment to Guide Neural Translation
title_full_unstemmed Heavyweight Statistical Alignment to Guide Neural Translation
title_short Heavyweight Statistical Alignment to Guide Neural Translation
title_sort heavyweight statistical alignment to guide neural translation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9187440/
https://www.ncbi.nlm.nih.gov/pubmed/35694597
http://dx.doi.org/10.1155/2022/6856567
work_keys_str_mv AT nguyenthien heavyweightstatisticalalignmenttoguideneuraltranslation
AT nguyentrang heavyweightstatisticalalignmenttoguideneuraltranslation