Cargando…

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention

Transformer-based models have gained significant advances in neural machine translation (NMT). The main component of the transformer is the multihead attention layer. In theory, more heads enhance the expressive power of the NMT model. But this is not always the case in practice. On the one hand, th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Dongxing, Luo, Zuying
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9239798/ https://www.ncbi.nlm.nih.gov/pubmed/35774445 http://dx.doi.org/10.1155/2022/2998242

_version_	1784737384467791872
author	Li, Dongxing Luo, Zuying
author_facet	Li, Dongxing Luo, Zuying
author_sort	Li, Dongxing
collection	PubMed
description	Transformer-based models have gained significant advances in neural machine translation (NMT). The main component of the transformer is the multihead attention layer. In theory, more heads enhance the expressive power of the NMT model. But this is not always the case in practice. On the one hand, the computations of each head attention are conducted in the same subspace, without considering the different subspaces of all the tokens. On the other hand, the low-rank bottleneck may occur, when the number of heads surpasses a threshold. To address the low-rank bottleneck, the two mainstream methods make the head size equal to the sequence length and complicate the distribution of self-attention heads. However, these methods are challenged by the variable sequence length in the corpus and the sheer number of parameters to be learned. Therefore, this paper proposes the interacting-head attention mechanism, which induces deeper and wider interactions across the attention heads by low-dimension computations in different subspaces of all the tokens, and chooses the appropriate number of heads to avoid low-rank bottleneck. The proposed model was tested on machine translation tasks of IWSLT2016 DE-EN, WMT17 EN-DE, and WMT17 EN-CS. Compared to the original multihead attention, our model improved the performance by 2.78 BLEU/0.85 WER/2.90 METEOR/2.65 ROUGE_L/0.29 CIDEr/2.97 YiSi and 2.43 BLEU/1.38 WER/3.05 METEOR/2.70 ROUGE_L/0.30 CIDEr/3.59 YiSi on the evaluation set and the test set, respectively, for IWSLT2016 DE-EN, 2.31 BLEU/5.94 WER/1.46 METEOR/1.35 ROUGE_L/0.07 CIDEr/0.33 YiSi and 1.62 BLEU/6.04 WER/1.39 METEOR/0.11 CIDEr/0.87 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-DE, and 3.87 BLEU/3.05 WER/9.22 METEOR/3.81 ROUGE_L/0.36 CIDEr/4.14 YiSi and 4.62 BLEU/2.41 WER/9.82 METEOR/4.82 ROUGE_L/0.44 CIDEr/5.25 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-CS.
format	Online Article Text
id	pubmed-9239798
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-92397982022-06-29 An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention Li, Dongxing Luo, Zuying Comput Intell Neurosci Research Article Transformer-based models have gained significant advances in neural machine translation (NMT). The main component of the transformer is the multihead attention layer. In theory, more heads enhance the expressive power of the NMT model. But this is not always the case in practice. On the one hand, the computations of each head attention are conducted in the same subspace, without considering the different subspaces of all the tokens. On the other hand, the low-rank bottleneck may occur, when the number of heads surpasses a threshold. To address the low-rank bottleneck, the two mainstream methods make the head size equal to the sequence length and complicate the distribution of self-attention heads. However, these methods are challenged by the variable sequence length in the corpus and the sheer number of parameters to be learned. Therefore, this paper proposes the interacting-head attention mechanism, which induces deeper and wider interactions across the attention heads by low-dimension computations in different subspaces of all the tokens, and chooses the appropriate number of heads to avoid low-rank bottleneck. The proposed model was tested on machine translation tasks of IWSLT2016 DE-EN, WMT17 EN-DE, and WMT17 EN-CS. Compared to the original multihead attention, our model improved the performance by 2.78 BLEU/0.85 WER/2.90 METEOR/2.65 ROUGE_L/0.29 CIDEr/2.97 YiSi and 2.43 BLEU/1.38 WER/3.05 METEOR/2.70 ROUGE_L/0.30 CIDEr/3.59 YiSi on the evaluation set and the test set, respectively, for IWSLT2016 DE-EN, 2.31 BLEU/5.94 WER/1.46 METEOR/1.35 ROUGE_L/0.07 CIDEr/0.33 YiSi and 1.62 BLEU/6.04 WER/1.39 METEOR/0.11 CIDEr/0.87 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-DE, and 3.87 BLEU/3.05 WER/9.22 METEOR/3.81 ROUGE_L/0.36 CIDEr/4.14 YiSi and 4.62 BLEU/2.41 WER/9.82 METEOR/4.82 ROUGE_L/0.44 CIDEr/5.25 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-CS. Hindawi 2022-06-21 /pmc/articles/PMC9239798/ /pubmed/35774445 http://dx.doi.org/10.1155/2022/2998242 Text en Copyright © 2022 Dongxing Li and Zuying Luo. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Li, Dongxing Luo, Zuying An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
title	An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
title_full	An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
title_fullStr	An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
title_full_unstemmed	An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
title_short	An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
title_sort	improved transformer-based neural machine translation strategy: interacting-head attention
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9239798/ https://www.ncbi.nlm.nih.gov/pubmed/35774445 http://dx.doi.org/10.1155/2022/2998242
work_keys_str_mv	AT lidongxing animprovedtransformerbasedneuralmachinetranslationstrategyinteractingheadattention AT luozuying animprovedtransformerbasedneuralmachinetranslationstrategyinteractingheadattention AT lidongxing improvedtransformerbasedneuralmachinetranslationstrategyinteractingheadattention AT luozuying improvedtransformerbasedneuralmachinetranslationstrategyinteractingheadattention

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention

Ejemplares similares