Cargando…
Heavyweight Statistical Alignment to Guide Neural Translation
Transformer neural models with multihead attentions outperform all existing translation models. Nevertheless, some features of traditional statistical models, such as prior alignment between source and target words, prove useful in training the state-of-the-art Transformer models. It has been report...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9187440/ https://www.ncbi.nlm.nih.gov/pubmed/35694597 http://dx.doi.org/10.1155/2022/6856567 |
Sumario: | Transformer neural models with multihead attentions outperform all existing translation models. Nevertheless, some features of traditional statistical models, such as prior alignment between source and target words, prove useful in training the state-of-the-art Transformer models. It has been reported that lightweight prior alignment can effectively guide a head in the multihead cross-attention sublayer responsible for the alignment of Transformer models. In this work, we make a step further by applying heavyweight prior alignments to guide all heads. Specifically, we use the weight of 0.5 for the alignment cost added to the token cost in formulating the overall cost of training a Transformer model, where the alignment cost is defined as the deviation of the attention probability from the prior alignments. Moreover, we increase the role of prior alignment, computing the attention probability by averaging all heads of the multihead attention sublayer within the penultimate layer of Transformer model. Experimental results on an English-Vietnamese translation task show that our proposed approach helps train superior Transformer-based translation models. Our Transformer model (25.71) outperforms the baseline model (21.34) by the large 4.37 BLEU. Case studies by native speakers on some translation results validate the machine judgment. The results so far encourage the use of heavyweight prior alignments to improve Transformer-based translation models. This work contributes to the literature on the machine translation, especially, for unpopular language pairs. Since the proposal in this work is language-independent, it can be applied to different language pairs, including Slavic languages. |
---|